• Title/Summary/Keyword: large-language model

Search Result 279, Processing Time 0.031 seconds

Recent R&D Trends for Pretrained Language Model (딥러닝 사전학습 언어모델 기술 동향)

  • Lim, J.H.;Kim, H.K.;Kim, Y.K.
    • Electronics and Telecommunications Trends
    • /
    • v.35 no.3
    • /
    • pp.9-19
    • /
    • 2020
  • Recently, a technique for applying a deep learning language model pretrained from a large corpus to fine-tuning for each application task has been widely used as a language processing technology. The pretrained language model shows higher performance and satisfactory generalization performance than existing methods. This paper introduces the major research trends related to deep learning pretrained language models in the field of language processing. We describe in detail the motivations, models, learning methods, and results of the BERT language model that had significant influence on subsequent studies. Subsequently, we introduce the results of language model studies after BERT, focusing on SpanBERT, RoBERTa, ALBERT, BART, and ELECTRA. Finally, we introduce the KorBERT pretrained language model, which shows satisfactory performance in Korean language. In addition, we introduce techniques on how to apply the pretrained language model to Korean (agglutinative) language, which consists of a combination of content and functional morphemes, unlike English (refractive) language whose endings change depending on the application.

Utilizing Large Language Models for Non-trained Binary Sentiment Classification (거대 언어 모델(LLM)을 이용한 비훈련 이진 감정 분류)

  • Hyungjin Ahn;Taewook Hwang;Sangkeun Jung
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.66-71
    • /
    • 2023
  • ChatGPT가 등장한 이후 다양한 거대 언어 모델(Large Language Model, LLM)이 등장하였고, 이러한 LLM을 목적에 맞게 파인튜닝하여 사용할 수 있게 되었다. 하지만 LLM을 새로 학습하는 것은 물론이고, 단순 튜닝만 하더라도 일반인은 시도하기 어려울 정도의 많은 컴퓨팅 자원이 필요하다. 본 연구에서는 공개된 LLM을 별도의 학습 없이 사용하여 zero-shot 프롬프팅으로 이진 분류 태스크에 대한 성능을 확인하고자 했다. 학습이나 추가적인 튜닝 없이도 기존 선학습 언어 모델들에 준하는 이진 분류 성능을 확인할 수 있었고, 성능이 좋은 LLM의 경우 분류 실패율이 낮고 일관적인 성능을 보여 상당히 높은 활용성을 확인하였다.

  • PDF

DeNERT: Named Entity Recognition Model using DQN and BERT

  • Yang, Sung-Min;Jeong, Ok-Ran
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.29-35
    • /
    • 2020
  • In this paper, we propose a new structured entity recognition DeNERT model. Recently, the field of natural language processing has been actively researched using pre-trained language representation models with a large amount of corpus. In particular, the named entity recognition, which is one of the fields of natural language processing, uses a supervised learning method, which requires a large amount of training dataset and computation. Reinforcement learning is a method that learns through trial and error experience without initial data and is closer to the process of human learning than other machine learning methodologies and is not much applied to the field of natural language processing yet. It is often used in simulation environments such as Atari games and AlphaGo. BERT is a general-purpose language model developed by Google that is pre-trained on large corpus and computational quantities. Recently, it is a language model that shows high performance in the field of natural language processing research and shows high accuracy in many downstream tasks of natural language processing. In this paper, we propose a new named entity recognition DeNERT model using two deep learning models, DQN and BERT. The proposed model is trained by creating a learning environment of reinforcement learning model based on language expression which is the advantage of the general language model. The DeNERT model trained in this way is a faster inference time and higher performance model with a small amount of training dataset. Also, we validate the performance of our model's named entity recognition performance through experiments.

Instruction Tuning for Controlled Text Generation in Korean Language Model (Instruction Tuning을 통한 한국어 언어 모델 문장 생성 제어)

  • Jinhee Jang;Daeryong Seo;Donghyeon Jeon;Inho Kang;Seung-Hoon Na
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.289-294
    • /
    • 2023
  • 대형 언어 모델(Large Language Model)은 방대한 데이터와 파라미터를 기반으로 문맥 이해에서 높은 성능을 달성하였지만, Human Alignment를 위한 문장 생성 제어 연구는 아직 활발한 도전 과제로 남아있다. 본 논문에서는 Instruction Tuning을 통한 문장 생성 제어 실험을 진행한다. 자연어 처리 도구를 사용하여 단일 혹은 다중 제약 조건을 포함하는 Instruction 데이터 셋을 자동으로 구축하고 한국어 언어 모델인 Polyglot-Ko 모델에 fine-tuning 하여 모델 생성이 제약 조건을 만족하는지 검증하였다. 실험 결과 4개의 제약 조건에 대해 평균 0.88의 accuracy를 보이며 효과적인 문장 생성 제어가 가능함을 확인하였다.

  • PDF

A Corpus Selection Based Approach to Language Modeling for Large Vocabulary Continuous Speech Recognition (대용량 연속 음성 인식 시스템에서의 코퍼스 선별 방법에 의한 언어모델 설계)

  • Oh, Yoo-Rhee;Yoon, Jae-Sam;kim, Hong-Kook
    • Proceedings of the KSPS conference
    • /
    • 2005.11a
    • /
    • pp.103-106
    • /
    • 2005
  • In this paper, we propose a language modeling approach to improve the performance of a large vocabulary continuous speech recognition system. The proposed approach is based on the active learning framework that helps to select a text corpus from a plenty amount of text data required for language modeling. The perplexity is used as a measure for the corpus selection in the active learning. From the recognition experiments on the task of continuous Korean speech, the speech recognition system employing the language model by the proposed language modeling approach reduces the word error rate by about 6.6 % with less computational complexity than that using a language model constructed with randomly selected texts.

  • PDF

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.

A Study on Visibility Graph Generating Model of Ada Program (Ada 프로그램의 Visibility Graph 생성모델에 관한 연구)

  • Jeong Jung-Yeong;Kim Hui-Ju;Yun Chang-Seop
    • Journal of the military operations research society of Korea
    • /
    • v.16 no.2
    • /
    • pp.56-74
    • /
    • 1990
  • Programming-in-the-Large refers to software development environment and includes the organization and representation of a system structure, module decomposition, component dependence analysis, seperate compilation, subsystem and composition identification. The most intricate problem in this environment is the mastery of the structural complexity of large software systems. Ada programming language is tailored to the needs for building of large, integrated software systems from many program units. The visibility graph generating model presented in this paper transforms Ada source program into a visibility graph with nodes for program units and edges for visibility relations among program units. The system description in terms of program units and their visibility relations produced by this model can be utilized for some apects of Programming-in-the-Large environment and also assists designeers, programmers, integrators and maintainers in defining, understanding and exploring the structure of evolving software systems. The model designed and implemented in Ada programming language runs on PCs and will remain useful both in practice and as experimental tool.

  • PDF

Subword Neural Language Generation with Unlikelihood Training

  • Iqbal, Salahuddin Muhammad;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.2
    • /
    • pp.45-50
    • /
    • 2020
  • A Language model with neural networks commonly trained with likelihood loss. Such that the model can learn the sequence of human text. State-of-the-art results achieved in various language generation tasks, e.g., text summarization, dialogue response generation, and text generation, by utilizing the language model's next token output probabilities. Monotonous and boring outputs are a well-known problem of this model, yet only a few solutions proposed to address this problem. Several decoding techniques proposed to suppress repetitive tokens. Unlikelihood training approached this problem by penalizing candidate tokens probabilities if the tokens already seen in previous steps. While the method successfully showed a less repetitive generated token, the method has a large memory consumption because of the training need a big vocabulary size. We effectively reduced memory footprint by encoding words as sequences of subword units. Finally, we report competitive results with token level unlikelihood training in several automatic evaluations compared to the previous work.