• Title/Summary/Keyword: evaluation criteria for LLM

Search Result 2, Processing Time 0.018 seconds

A Proposal of Evaluation of Large Language Models Built Based on Research Data (연구데이터 관점에서 본 거대언어모델 품질 평가 기준 제언)

  • Na-eun Han;Sujeong Seo;Jung-ho Um
    • Journal of the Korean Society for information Management
    • /
    • v.40 no.3
    • /
    • pp.77-98
    • /
    • 2023
  • Large Language Models (LLMs) are becoming the major trend in the natural language processing field. These models were built based on research data, but information such as types, limitations, and risks of using research data are unknown. This research would present how to analyze and evaluate the LLMs that were built with research data: LLaMA or LLaMA base models such as Alpaca of Stanford, Vicuna of the large model systems organization, and ChatGPT from OpenAI from the perspective of research data. This quality evaluation focuses on the validity, functionality, and reliability of Data Quality Management (DQM). Furthermore, we adopted the Holistic Evaluation of Language Models (HELM) to understand its evaluation criteria and then discussed its limitations. This study presents quality evaluation criteria for LLMs using research data and future development directions.

Proposal for the Utilization and Refinement Techniques of LLMs for Automated Research Generation (관련 연구 자동 생성을 위한 LLM의 활용 및 정제 기법 제안)

  • Seung-min Choi;Yu-chul, Jung
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.17 no.4
    • /
    • pp.275-287
    • /
    • 2024
  • Research on the integration of Knowledge Graphs (KGs) and Language Models (LMs) has been consistently explored over the years. However, studies focusing on the automatic generation of text using the structured knowledge from KGs have not been as widely developed. In this study, we propose a methodology for automatically generating specific domain-related research items (Related Work) at a level comparable to existing papers. This methodology involves: 1) selecting optimal prompts, 2) extracting triples through a four-step refinement process, 3) constructing a knowledge graph, and 4) automatically generating related research. The proposed approach utilizes GPT-4, one of the large language models (LLMs), and is desigend to automatically generate related research by applying the four-step refinement process. The model demonstrated performance metrics of 17.3, 14.1, and 4.2 in Triple extraction across #Supp, #Cont, and Fluency, respectively. According to the GPT-4 automatic evaluation criteria, the model's performamce improved from 88.5 points vefore refinement to 96.5 points agter refinement out of 100, indicating a significant capability to automatically generate related research at a level similar to that of existing papers.