• Title/Summary/Keyword: natural language generation

Search Result 134, Processing Time 0.023 seconds

Design of Sentence Semantic Model for Cause-Effect Graph Automatic Generation from Natural Language Oriented Informal Requirement Specifications (비정형 요구사항으로부터 원인-결과 그래프 자동 발생을 위한 문장 의미 모델(Sentence Semantic Model) 설계)

  • Jang, Woo Sung;Jung, Se Jun;Kim, R.Young Chul
    • Annual Conference on Human and Language Technology
    • /
    • 2020.10a
    • /
    • pp.215-219
    • /
    • 2020
  • 현재 한글 언어학 영역에서는 많은 언어 분석 연구가 수행되었다. 또한 소프트웨어공학의 요구공학 영역에서는 명료한 요구사항 정의와 분석이 필요하고, 비정형화된 요구사항 명세서로부터 테스트 케이스 추출이 매우 중요한 이슈이다. 즉, 자연어 기반의 요구사항 명세서로부터 원인-결과 그래프(Cause-Effect Graph)를 통한 의사 결정 테이블(Decision Table) 기반 테스트케이스(Test Case)를 자동 생성하는 방법이 거의 없다. 이런 문제를 해결하기 위해 '한글 언어 의미 분석 기법'을 '요구공학 영역'에 적용하는 방법이 필요하다. 본 논문은 비정형화된 요구사항으로부터 테스트케이스 생성하는 과정의 중간 단계인 요구사항에서 문장 의미 모델(Sentence Semantic Model)을 자동 생성하는 방법을 제안 한다. 이는 요구사항으로부터 생성된 원인-결과 그래프의 정확성을 검증할 수 있다.

  • PDF

Development of a Regulatory Q&A System for KAERI Utilizing Document Search Algorithms and Large Language Model (거대언어모델과 문서검색 알고리즘을 활용한 한국원자력연구원 규정 질의응답 시스템 개발)

  • Hongbi Kim;Yonggyun Yu
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.28 no.5
    • /
    • pp.31-39
    • /
    • 2023
  • The evolution of Natural Language Processing (NLP) and the rise of large language models (LLM) like ChatGPT have paved the way for specialized question-answering (QA) systems tailored to specific domains. This study outlines a system harnessing the power of LLM in conjunction with document search algorithms to interpret and address user inquiries using documents from the Korea Atomic Energy Research Institute (KAERI). Initially, the system refines multiple documents for optimized search and analysis, breaking the content into managable paragraphs suitable for the language model's processing. Each paragraph's content is converted into a vector via an embedding model and archived in a database. Upon receiving a user query, the system matches the extracted vectors from the question with the stored vectors, pinpointing the most pertinent content. The chosen paragraphs, combined with the user's query, are then processed by the language generation model to formulate a response. Tests encompassing a spectrum of questions verified the system's proficiency in discerning question intent, understanding diverse documents, and delivering rapid and precise answers.

Train Booking Agent with Adaptive Sentence Generation Using Interactive Genetic Programming (대화형 유전 프로그래밍을 이용한 적응적 문장생성 열차예약 에이전트)

  • Lim, Sung-Soo;Cho, Sung-Bae
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.2
    • /
    • pp.119-128
    • /
    • 2006
  • As dialogue systems are widely required, the research on natural language generation in dialogue has raised attention. Contrary to conventional dialogue systems that reply to the user with a set of predefined answers, a newly developed dialogue system generates them dynamically and trains the answers to support more flexible and customized dialogues with humans. This paper proposes an evolutionary method for generating sentences using interactive genetic programming. Sentence plan trees, which stand for the sentence structures, are adopted as the representation of genetic programming. With interactive evolution process with the user, a set of customized sentence structures is obtained. The proposed method applies to a dialogue-based train booking agent and the usability test demonstrates the usefulness of the proposed method.

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

A Study of Pre-trained Language Models for Korean Language Generation (한국어 자연어생성에 적합한 사전훈련 언어모델 특성 연구)

  • Song, Minchae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.28 no.4
    • /
    • pp.309-328
    • /
    • 2022
  • This study empirically analyzed a Korean pre-trained language models (PLMs) designed for natural language generation. The performance of two PLMs - BART and GPT - at the task of abstractive text summarization was compared. To investigate how performance depends on the characteristics of the inference data, ten different document types, containing six types of informational content and creation content, were considered. It was found that BART (which can both generate and understand natural language) performed better than GPT (which can only generate). Upon more detailed examination of the effect of inference data characteristics, the performance of GPT was found to be proportional to the length of the input text. However, even for the longest documents (with optimal GPT performance), BART still out-performed GPT, suggesting that the greatest influence on downstream performance is not the size of the training data or PLMs parameters but the structural suitability of the PLMs for the applied downstream task. The performance of different PLMs was also compared through analyzing parts of speech (POS) shares. BART's performance was inversely related to the proportion of prefixes, adjectives, adverbs and verbs but positively related to that of nouns. This result emphasizes the importance of taking the inference data's characteristics into account when fine-tuning a PLMs for its intended downstream task.

A PageRank based Data Indexing Method for Designing Natural Language Interface to CRM Databases (분석 CRM 실무자의 자연어 질의 처리를 위한 기업 데이터베이스 구성요소 인덱싱 방법론)

  • Park, Sung-Hyuk;Hwang, Kyeong-Seo;Lee, Dong-Won
    • CRM연구
    • /
    • v.2 no.2
    • /
    • pp.53-70
    • /
    • 2009
  • Understanding consumer behavior based on the analysis of the customer data is one essential part of analytic CRM. To do this, the analytic skills for data extraction and data processing are required to users. As a user has various kinds of questions for the consumer data analysis, the user should use database language such as SQL. However, for the firm's user, to generate SQL statements is not easy because the accuracy of the query result is hugely influenced by the knowledge of work-site operation and the firm's database. This paper proposes a natural language based database search framework finding relevant database elements. Specifically, we describe how our TableRank method can understand the user's natural query language and provide proper relations and attributes of data records to the user. Through several experiments, it is supported that the TableRank provides accurate database elements related to the user's natural query. We also show that the close distance among relations in the database represents the high data connectivity which guarantees matching with a search query from a user.

  • PDF

A Study on the Development of Structural Analysis Program using MATLAB Language (MATLAB 언어를 이용한 구조해석 프로그램 개발에 관한 연구)

  • 배동명;강상중
    • Journal of the Korean Society of Fisheries and Ocean Technology
    • /
    • v.36 no.4
    • /
    • pp.347-353
    • /
    • 2000
  • The construction and ability of CAE program are presented. The merit and ability of MATLAB which is widely using in the field of recently engineering and natural science are also introduced. Also, analysis program of frame structure used the MATLAB language which is divide in 4th generation language is presented. In this paper, the proposed program using MATLB language to be based upon the composition of general CAE program is composed to preprocess, solver and post-process procedure. And it is able to carried out the static and eigenvalue analysis of truss structure and two dimensional frame structure. Also, for the sample pre-processing and post-processing, it is used the characteristic of input window and plot window to be made of the various GUI function. Each finite elements to be required for analysis is formulated by the Galerkin's method, as a kind of weighted residual method. For check of the results of calculation for program used in this paper, the results to be calculated using program to be developed by the author was compared with its of ANSYS code for general structural analysis about two dimensional truss and frame structure.

  • PDF

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

On the Automatic Generation of Illustrations for Events in Storybooks: Representation of Illustrative Events (동화책에서의 삽화 자동 생성 -삽화를 위한 사건 표현)

  • Baek, Seung-Cheol;Lee, Hee-Jin;Park, Jong-C.
    • 한국HCI학회:학술대회논문집
    • /
    • 2008.02a
    • /
    • pp.390-396
    • /
    • 2008
  • Storybooks, especially those for children, may contain illustrations. An automated system for generating illustrations would help the production process of storybook publishing. In this paper, we propose a method for automatically generating layouts of objects during generating illustrations. In generated layouts, it is preferred to avoid unnecessary overlap between objects, corresponding to the spatial information in storybooks. We first define a representation scheme for spatial information in natural language sentences using tree structures and predicate-argument structures. Unification of tree structures and Region Connection Calculus are then used to manipulate the information and generate corresponding illustrations.

  • PDF

Domain Question Answering System (도메인 질의응답 시스템)

  • Yoon, Seunghyun;Rhim, Eunhee;Kim, Deokho
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.2
    • /
    • pp.144-147
    • /
    • 2015
  • Question Answering (QA) services can provide exact answers to user questions written in natural language form. This research focuses on how to build a QA system for a specific domain area. Online and offline QA system architecture of targeted domain such as domain detection, question analysis, reasoning, information retrieval, filtering, answer extraction, re-ranking, and answer generation, as well as data preparation are presented herein. Test results with an official Frequently Asked Question (FAQ) set showed 68% accuracy of the top 1 and 77% accuracy of the top 5. The contribution of each part such as question analysis system, document search engine, knowledge graph engine and re-ranking module for achieving the final answer are also presented.