• Title/Summary/Keyword: 자연어 이해

Search Result 177, Processing Time 0.028 seconds

Analyzing Korean Math Word Problem Data Classification Difficulty Level Using the KoEPT Model (KoEPT 기반 한국어 수학 문장제 문제 데이터 분류 난도 분석)

  • Rhim, Sangkyu;Ki, Kyung Seo;Kim, Bugeun;Gweon, Gahgene
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.8
    • /
    • pp.315-324
    • /
    • 2022
  • In this paper, we propose KoEPT, a Transformer-based generative model for automatic math word problems solving. A math word problem written in human language which describes everyday situations in a mathematical form. Math word problem solving requires an artificial intelligence model to understand the implied logic within the problem. Therefore, it is being studied variously across the world to improve the language understanding ability of artificial intelligence. In the case of the Korean language, studies so far have mainly attempted to solve problems by classifying them into templates, but there is a limitation in that these techniques are difficult to apply to datasets with high classification difficulty. To solve this problem, this paper used the KoEPT model which uses 'expression' tokens and pointer networks. To measure the performance of this model, the classification difficulty scores of IL, CC, and ALG514, which are existing Korean mathematical sentence problem datasets, were measured, and then the performance of KoEPT was evaluated using 5-fold cross-validation. For the Korean datasets used for evaluation, KoEPT obtained the state-of-the-art(SOTA) performance with 99.1% in CC, which is comparable to the existing SOTA performance, and 89.3% and 80.5% in IL and ALG514, respectively. In addition, as a result of evaluation, KoEPT showed a relatively improved performance for datasets with high classification difficulty. Through an ablation study, we uncovered that the use of the 'expression' tokens and pointer networks contributed to KoEPT's state of being less affected by classification difficulty while obtaining good performance.

A Study on Improving Performance of Software Requirements Classification Models by Handling Imbalanced Data (불균형 데이터 처리를 통한 소프트웨어 요구사항 분류 모델의 성능 개선에 관한 연구)

  • Jong-Woo Choi;Young-Jun Lee;Chae-Gyun Lim;Ho-Jin Choi
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.295-302
    • /
    • 2023
  • Software requirements written in natural language may have different meanings from the stakeholders' viewpoint. When designing an architecture based on quality attributes, it is necessary to accurately classify quality attribute requirements because the efficient design is possible only when appropriate architectural tactics for each quality attribute are selected. As a result, although many natural language processing models have been studied for the classification of requirements, which is a high-cost task, few topics improve classification performance with the imbalanced quality attribute datasets. In this study, we first show that the classification model can automatically classify the Korean requirement dataset through experiments. Based on these results, we explain that data augmentation through EDA(Easy Data Augmentation) techniques and undersampling strategies can improve the imbalance of quality attribute datasets, and show that they are effective in classifying requirements. The results improved by 5.24%p on F1-score, indicating that handling imbalanced data helps classify Korean requirements of classification models. Furthermore, detailed experiments of EDA illustrate operations that help improve classification performance.

Analysis of Research Trends in Deep Learning-Based Video Captioning (딥러닝 기반 비디오 캡셔닝의 연구동향 분석)

  • Lyu Zhi;Eunju Lee;Youngsoo Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.13 no.1
    • /
    • pp.35-49
    • /
    • 2024
  • Video captioning technology, as a significant outcome of the integration between computer vision and natural language processing, has emerged as a key research direction in the field of artificial intelligence. This technology aims to achieve automatic understanding and language expression of video content, enabling computers to transform visual information in videos into textual form. This paper provides an initial analysis of the research trends in deep learning-based video captioning and categorizes them into four main groups: CNN-RNN-based Model, RNN-RNN-based Model, Multimodal-based Model, and Transformer-based Model, and explain the concept of each video captioning model. The features, pros and cons were discussed. This paper lists commonly used datasets and performance evaluation methods in the video captioning field. The dataset encompasses diverse domains and scenarios, offering extensive resources for the training and validation of video captioning models. The model performance evaluation method mentions major evaluation indicators and provides practical references for researchers to evaluate model performance from various angles. Finally, as future research tasks for video captioning, there are major challenges that need to be continuously improved, such as maintaining temporal consistency and accurate description of dynamic scenes, which increase the complexity in real-world applications, and new tasks that need to be studied are presented such as temporal relationship modeling and multimodal data integration.

Artificial Intelligence and College Mathematics Education (인공지능(Artificial Intelligence)과 대학수학교육)

  • Lee, Sang-Gu;Lee, Jae Hwa;Ham, Yoonmee
    • Communications of Mathematical Education
    • /
    • v.34 no.1
    • /
    • pp.1-15
    • /
    • 2020
  • Today's healthcare, intelligent robots, smart home systems, and car sharing are already innovating with cutting-edge information and communication technologies such as Artificial Intelligence (AI), the Internet of Things, the Internet of Intelligent Things, and Big data. It is deeply affecting our lives. In the factory, robots have been working for humans more than several decades (FA, OA), AI doctors are also working in hospitals (Dr. Watson), AI speakers (Giga Genie) and AI assistants (Siri, Bixby, Google Assistant) are working to improve Natural Language Process. Now, in order to understand AI, knowledge of mathematics becomes essential, not a choice. Thus, mathematicians have been given a role in explaining such mathematics that make these things possible behind AI. Therefore, the authors wrote a textbook 'Basic Mathematics for Artificial Intelligence' by arranging the mathematics concepts and tools needed to understand AI and machine learning in one or two semesters, and organized lectures for undergraduate and graduate students of various majors to explore careers in artificial intelligence. In this paper, we share our experience of conducting this class with the full contents in http://matrix.skku.ac.kr/math4ai/.

Verarbeitungsprozess der Bedeutungen von sprachlichen $Ausdr\"{u}cken$ (언어표현에 나타난 의미의 처리과정)

  • OH Young Hun
    • Koreanishche Zeitschrift fur Deutsche Sprachwissenschaft
    • /
    • v.3
    • /
    • pp.277-301
    • /
    • 2001
  • 우리가 간단히 사용하는 언어는 실제적으로 아주 복잡한 진행과정을 가지고 있다. 사전상의 각 어휘는 대화상황에서 상호 작용하는 초기단계의 역할을 하며, 표현은 과거나 현재에 행해지는 대화상황 및 대화참여자의 발화 과정에서 생기는 일종의 일체감을 표시한다. 의사 소통을 한다는 것은 단어나 문장에 대한 다양한 의미와 각각의 개념에서 지시되는 표현을 수단으로 발생하는 대화상의 연관성을 의미한다. 이러한 모든 것은 의사소통에 있어 의미의 다양성과 관련을 맺고있다. 우리는 표현을 통하여 매우 복잡하고 다양한 양상들을 볼 수 있다. 대화내용에 따라 똑같은 표현들이 서로 다르게 이해될 수 있기 때문이다. 언어는 단지 사람이 행하는 언어처리의 일부만을 보여줄 뿐이다 언어를 처리하는데 있어서 문제가 되는 것은 매우 복잡하고 구성적인 진행과정이다. 청자는 의사소통이 진행되는 과정에서 활자와 함께 주어진 정보를 처리함으로써 상황을 내적 형상화하게 된다. 따라서 청자는 표현의 의미를 이해하려고 노력하며, 다양한 방법을 동원한 지식을 사용한다. 의사소통에 있어서 통사적$\cdot$의미적인 지식, 문맥에 맞는 대화지식 혹은 일반 지식을 대화상황에 맞게 적용하는 것이 그 예라 할 수 있다. 지시적 언어의 표현은 사전적으로 고정된 단어의 의미를 규정하거나 또는 이와 같은 단어의 의미에 정확하고 적절한 지시사를 규정하는 근거가 된다. 인칭$\cdot$장소$\cdot$시간을 지시하는 언어 Personal-, Lokal-, Temporaldeixis는 언어 시스템을 형성하게 되는데, 활자와 청자는 이러한 표현을 인칭$\cdot$장소$\cdot$시간으로 형상화하면서 의사소통을 한다. 따라서 자연어의 처리과정에 나타나는 다양한 표현들에 대한 심리학 및 언어학의 강력한 연구가 요구된다.에 기대어 텍스트, 문장, 어휘영역 등이 투입되어 적용되었으며, 이에 상응되게 구체적인 몇몇 방안들이 제시되었다. 학습자들이 텍스트를 읽고 중심내용을 찾아내며, 단락을 구획하고 또한 체계를 파악하는데 있어서 어휘연습은 외국어 교수법 측면에서도 매우 관여적이며 시의적절한 과제라 생각된다. Sd 2) PL - Sn - pS: (1) PL[VPL - Sa] - Sn - pS (2) PL[VPL - pS] - Sn - pS (3) PL(VPL - Sa - pS) - Sn - pS 3) PL[VPL - pS) - Sn -Sa $\cdot$ 3가 동사 관용구: (1) PL[VPL - pS] - Sn - Sd - Sa (2) PL[VPL - pS] - Sn - Sa - pS (3) PL[VPL - Sa] - Sn - Sd - pS 이러한 분류가 보여주듯이, 독일어에는 1가, 2가, 3가의 관용구가 있으며, 구조 외적으로 동일한 통사적 결합가를 갖는다 하더라도 구조 내적 성분구조가 다르다는 것을 알 수 있다. 우리는 이 글이 외국어로서의 독일어를 배우는 이들에게 독일어의 관용구를 보다 올바르게 이해할 수 있는 방법론적인 토대를 제공함은 물론, (관용어) 사전에서 외국인 학습자를 고려하여 관용구를 알기 쉽게 기술하는 데 도움을 줄 수 있기를 바란다.되기 시작하면서 남황해 분지는 구조역전의 현상이 일어났으며, 동시에 발해 분지는 인리형 분지로 발달하게 되었다. 따라서, 올리고세 동안 발해 분지에서는 퇴적작용이, 남황해 분지에서는 심한 구조역전에 의한 분지변형이 동시에 일어났다 올리고세 이후 현재까지, 남황해 분지와 발해 분지들은 간헐적인 해침과 함께 광역적 침강을 유지하면서 안정된 대륙 및 대륙붕 지역으로 전이되었다.

  • PDF

Preprocessing Technique for Malicious Comments Detection Considering the Form of Comments Used in the Online Community (온라인 커뮤니티에서 사용되는 댓글의 형태를 고려한 악플 탐지를 위한 전처리 기법)

  • Kim Hae Soo;Kim Mi Hui
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.12 no.3
    • /
    • pp.103-110
    • /
    • 2023
  • With the spread of the Internet, anonymous communities emerged along with the activation of communities for communication between people, and many users are doing harm to others, such as posting aggressive posts and leaving comments using anonymity. In the past, administrators directly checked posts and comments, then deleted and blocked them, but as the number of community users increased, they reached a level that managers could not continue to monitor. Initially, word filtering techniques were used to prevent malicious writing from being posted in a form that could not post or comment if a specific word was included, but they avoided filtering in a bypassed form, such as using similar words. As a way to solve this problem, deep learning was used to monitor posts posted by users in real-time, but recently, the community uses words that can only be understood by the community or from a human perspective, not from a general Korean word. There are various types and forms of characters, making it difficult to learn everything in the artificial intelligence model. Therefore, in this paper, we proposes a preprocessing technique in which each character of a sentence is imaged using a CNN model that learns the consonants, vowel and spacing images of Korean word and converts characters that can only be understood from a human perspective into characters predicted by the CNN model. As a result of the experiment, it was confirmed that the performance of the LSTM, BiLSTM and CNN-BiLSTM models increased by 3.2%, 3.3%, and 4.88%, respectively, through the proposed preprocessing technique.

Understanding of Generative Artificial Intelligence Based on Textual Data and Discussion for Its Application in Science Education (텍스트 기반 생성형 인공지능의 이해와 과학교육에서의 활용에 대한 논의)

  • Hunkoog Jho
    • Journal of The Korean Association For Science Education
    • /
    • v.43 no.3
    • /
    • pp.307-319
    • /
    • 2023
  • This study aims to explain the key concepts and principles of text-based generative artificial intelligence (AI) that has been receiving increasing interest and utilization, focusing on its application in science education. It also highlights the potential and limitations of utilizing generative AI in science education, providing insights for its implementation and research aspects. Recent advancements in generative AI, predominantly based on transformer models consisting of encoders and decoders, have shown remarkable progress through optimization of reinforcement learning and reward models using human feedback, as well as understanding context. Particularly, it can perform various functions such as writing, summarizing, keyword extraction, evaluation, and feedback based on the ability to understand various user questions and intents. It also offers practical utility in diagnosing learners and structuring educational content based on provided examples by educators. However, it is necessary to examine the concerns regarding the limitations of generative AI, including the potential for conveying inaccurate facts or knowledge, bias resulting from overconfidence, and uncertainties regarding its impact on user attitudes or emotions. Moreover, the responses provided by generative AI are probabilistic based on response data from many individuals, which raises concerns about limiting insightful and innovative thinking that may offer different perspectives or ideas. In light of these considerations, this study provides practical suggestions for the positive utilization of AI in science education.

Multi-Document Summarization Method Based on Semantic Relationship using VAE (VAE를 이용한 의미적 연결 관계 기반 다중 문서 요약 기법)

  • Baek, Su-Jin
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.341-347
    • /
    • 2017
  • As the amount of document data increases, the user needs summarized information to understand the document. However, existing document summary research methods rely on overly simple statistics, so there is insufficient research on multiple document summaries for ambiguity of sentences and meaningful sentence generation. In this paper, we investigate semantic connection and preprocessing process to process unnecessary information. Based on the vocabulary semantic pattern information, we propose a multi-document summarization method that enhances semantic connectivity between sentences using VAE. Using sentence word vectors, we reconstruct sentences after learning from compressed information and attribute discriminators generated as latent variables, and semantic connection processing generates a natural summary sentence. Comparing the proposed method with other document summarization methods showed a fine but improved performance, which proved that semantic sentence generation and connectivity can be increased. In the future, we will study how to extend semantic connections by experimenting with various attribute settings.

An Interactive Search Agent based on DotQuery (닷큐어리를 활용한 대화형 검색 에이전트)

  • Kim Sun-Ok
    • Journal of the Korea Society of Computer and Information
    • /
    • v.11 no.4 s.42
    • /
    • pp.271-281
    • /
    • 2006
  • Due to the development of Internet, number of online documents and the amount of web services are increasing dramatically. However, there are several procedures required, before you actually find what you were looking for. These procedures are necessary to Internet users, but it takes time to search. As a method to systematize and simplify this repetitive job, this paper suggests a DotQuery based interactive search agent. This agent enables a user to search, from his computer, a plenty of information through the DotQuery service. which includes natural languages. and it executes several procedures required instead. This agent also functions as a plug-in service within general web browsers such as Internet Explorer and decodes the DotQuery service. Then it analyzes the DotQuery from a user through its own program and acquires service results through multiple browsers of its own.

  • PDF

Analyzing Correlations between Movie Characters Based on Deep Learning

  • Jin, Kyo Jun;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.9-17
    • /
    • 2021
  • Humans are social animals that have gained information or social interaction through dialogue. In conversation, the mood of the word can change depending on the sensibility of one person to another. Relationships between characters in films are essential for understanding stories and lines between characters, but methods to extract this information from films have not been investigated. Therefore, we need a model that automatically analyzes the relationship aspects in the movie. In this paper, we propose a method to analyze the relationship between characters in the movie by utilizing deep learning techniques to measure the emotion of each character pair. The proposed method first extracts main characters from the movie script and finds the dialogue between the main characters. Then, to analyze the relationship between the main characters, it performs a sentiment analysis, weights them according to the positions of the metabolites in the entire time intervals and gathers their scores. Experimental results with real data sets demonstrate that the proposed scheme is able to effectively measure the emotional relationship between the main characters.