• Title/Summary/Keyword: language processing

Search Result 2,686, Processing Time 0.022 seconds

Using Syntax and Shallow Semantic Analysis for Vietnamese Question Generation

  • Phuoc Tran;Duy Khanh Nguyen;Tram Tran;Bay Vo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.10
    • /
    • pp.2718-2731
    • /
    • 2023
  • This paper presents a method of using syntax and shallow semantic analysis for Vietnamese question generation (QG). Specifically, our proposed technique concentrates on investigating both the syntactic and shallow semantic structure of each sentence. The main goal of our method is to generate questions from a single sentence. These generated questions are known as factoid questions which require short, fact-based answers. In general, syntax-based analysis is one of the most popular approaches within the QG field, but it requires linguistic expert knowledge as well as a deep understanding of syntax rules in the Vietnamese language. It is thus considered a high-cost and inefficient solution due to the requirement of significant human effort to achieve qualified syntax rules. To deal with this problem, we collected the syntax rules in Vietnamese from a Vietnamese language textbook. Moreover, we also used different natural language processing (NLP) techniques to analyze Vietnamese shallow syntax and semantics for the QG task. These techniques include: sentence segmentation, word segmentation, part of speech, chunking, dependency parsing, and named entity recognition. We used human evaluation to assess the credibility of our model, which means we manually generated questions from the corpus, and then compared them with the generated questions. The empirical evidence demonstrates that our proposed technique has significant performance, in which the generated questions are very similar to those which are created by humans.

Sequence Labeling-based Multiple Causal Relations Extraction using Pre-trained Language Model for Maritime Accident Prevention (해양사고 예방을 위한 사전학습 언어모델의 순차적 레이블링 기반 복수 인과관계 추출)

  • Ki-Yeong Moon;Do-Hyun Kim;Tae-Hoon Yang;Sang-Duck Lee
    • Journal of the Korean Society of Safety
    • /
    • v.38 no.5
    • /
    • pp.51-57
    • /
    • 2023
  • Numerous studies have been conducted to analyze the causal relationships of maritime accidents using natural language processing techniques. However, when multiple causes and effects are associated with a single accident, the effectiveness of extracting these causal relations diminishes. To address this challenge, we compiled a dataset using verdicts from maritime accident cases in this study, analyzed their causal relations, and applied labeling considering the association information of various causes and effects. In addition, to validate the efficacy of our proposed methodology, we fine-tuned the KoELECTRA Korean language model. The results of our validation process demonstrated the ability of our approach to successfully extract multiple causal relationships from maritime accident cases.

A Study on the Web-based Map Algebraic Processor (웹 기반 지도대수 처리기에 관한 연구)

  • 박기호
    • Spatial Information Research
    • /
    • v.5 no.2
    • /
    • pp.147-160
    • /
    • 1997
  • "The "Map Algebra", beeing recognized as a viable theoretical framework for GIS (Geographica Infonnation System), models map layers as "operands" which are the basic unit of geo-processing, and a variety of GIS commands as "operators." In this paper, we attempt at lifting some limitations of map algebras proposed in GIS literature. First, we model map layer as "function" such that we may employ the notion of meta operator (or, higher-order funtion) available in the functional programming paradigm. This approach provides map algebraic language with "programmability" needed in GIS user language. Second, we extend the semantics of, and improve on the sytactic structure of map algebraic language. Mer the data model and language associated with map algebra are formalized, we proceed to design and implement a prototype of map algebraic processor. The parser of the language in our prototype plays the role of transforming the native and heterogeneous user language of current GISs into a canonical map algebraic language. The prototype, named "MapSee" is a proof-of-concept system for the ideas we propsed in this paper. We believe that the uniform interface based on the map algebraic language will make promising infrastructure to support "Internet GIS." This is because the uniform but powerful interface through the Web clients allow access to both geo-data and geo-processing resources distributed over the network.to both geo-data and geo-processing resources distributed over the network.

  • PDF

Implementation of Query Processing System in Temporal Databases (시간지원 데이터베이스의 질의처리 시스템 구현)

  • Lee, Eon-Bae;Kim, Dong-Ho;Ryu, Keun-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.6
    • /
    • pp.1418-1430
    • /
    • 1998
  • Temporal databases support an efficient historical management by means of valid time and transaction time. Valid time stands for the time when a data happens in the real world. And transaction time stands for the time when a data is stored in the database, Temporal Query Processing System(TQPS) should be extended so as tc process the temporal operations for the historical informations in the user query as well as the conventional relational operations. In this paper, the extended temporal query processing systems which is based on the previous temporal query processing system for TQuel(Temporal Query Language) consists of the temporal syntax analyzer, temporal semantic analyzer, temporal code generator, and temporal interpreter is to be described, The algorithm for additional functions such as transaction time management, temporal aggregates, temporal views, temporal joins and the heuristic optimization functions and their example how to be processed is shown.

  • PDF

The Design and Implementation of Two-Way Search Algorithm using Mobile Instant Messenger (모바일 인스턴스 메신저를 이용한 양방향 검색 알고리즘의 설계 및 구현)

  • Lee, Daesik;Jang, Chungryong;Lee, Yongkwon
    • Journal of Korea Society of Digital Industry and Information Management
    • /
    • v.11 no.2
    • /
    • pp.55-66
    • /
    • 2015
  • In this paper, we design and implement a two-way search algorithm that can provide a customized service through the user with real-time two-way communication using a mobile instant messaging service. Therefore, we design and implement the automative search system which enables delivering message to each user mobile terminal from a plurality of relay mobile terminals by utilizing the mobile instant messenger, not to deliver a message from the main server to the mobile instant messenger user directly. Two-way search system using the mobile instant messenger can be immediately collect the user's response is easy to identify the orientation of each user, and thus can be provided to establish a differentiated service plan. Also, It provides a number of services(text, photos, videos, etc) in real-time information to the user by utilizing the mobile instant messenger service without the need to install a separate application. Experiment results, data processing speed of the category processing way to search for the data of the DB server from a user mobile terminal is about 7.06sec, data processing number per minute is about 13 times. The data processing speed of the instruction processing way is about 3.10sec, data processing number per minute is about 10 times. The data processing speed of the natural language processing way is about 5.13sec, per data processing number per minute is about 7 times. Therefore in category processing way, command processing way and natural language processing way, instruction processing way is the most excellent in aspect of data processing speed, otherwise in aspect of per data processing number per minute, the category processing way is the best method.

Analysis of the Status of Natural Language Processing Technology Based on Deep Learning (딥러닝 중심의 자연어 처리 기술 현황 분석)

  • Park, Sang-Un
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.63-81
    • /
    • 2021
  • The performance of natural language processing is rapidly improving due to the recent development and application of machine learning and deep learning technologies, and as a result, the field of application is expanding. In particular, as the demand for analysis on unstructured text data increases, interest in NLP(Natural Language Processing) is also increasing. However, due to the complexity and difficulty of the natural language preprocessing process and machine learning and deep learning theories, there are still high barriers to the use of natural language processing. In this paper, for an overall understanding of NLP, by examining the main fields of NLP that are currently being actively researched and the current state of major technologies centered on machine learning and deep learning, We want to provide a foundation to understand and utilize NLP more easily. Therefore, we investigated the change of NLP in AI(artificial intelligence) through the changes of the taxonomy of AI technology. The main areas of NLP which consists of language model, text classification, text generation, document summarization, question answering and machine translation were explained with state of the art deep learning models. In addition, major deep learning models utilized in NLP were explained, and data sets and evaluation measures for performance evaluation were summarized. We hope researchers who want to utilize NLP for various purposes in their field be able to understand the overall technical status and the main technologies of NLP through this paper.

A Study of Fine Tuning Pre-Trained Korean BERT for Question Answering Performance Development (사전 학습된 한국어 BERT의 전이학습을 통한 한국어 기계독해 성능개선에 관한 연구)

  • Lee, Chi Hoon;Lee, Yeon Ji;Lee, Dong Hee
    • Journal of Information Technology Services
    • /
    • v.19 no.5
    • /
    • pp.83-91
    • /
    • 2020
  • Language Models such as BERT has been an important factor of deep learning-based natural language processing. Pre-training the transformer-based language models would be computationally expensive since they are consist of deep and broad architecture and layers using an attention mechanism and also require huge amount of data to train. Hence, it became mandatory to do fine-tuning large pre-trained language models which are trained by Google or some companies can afford the resources and cost. There are various techniques for fine tuning the language models and this paper examines three techniques, which are data augmentation, tuning the hyper paramters and partly re-constructing the neural networks. For data augmentation, we use no-answer augmentation and back-translation method. Also, some useful combinations of hyper parameters are observed by conducting a number of experiments. Finally, we have GRU, LSTM networks to boost our model performance with adding those networks to BERT pre-trained model. We do fine-tuning the pre-trained korean-based language model through the methods mentioned above and push the F1 score from baseline up to 89.66. Moreover, some failure attempts give us important lessons and tell us the further direction in a good way.

Exploring the Relationship Between Machine and Human Performance in Natural Language Processing Tasks (자연어 처리 태스크에 대한 기계와 인간의 성능 상관관계 연구)

  • Seoyoon Park;Heejae Kim;Seong-Woo Lee;Yejee Kang;Yeonji Jang;Hansaem Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.485-490
    • /
    • 2023
  • 언어 모델 발전에 따라 사람과 유사하게 글을 생성하고 태스크를 수행하는 LLM들이 등장하고 있다. 하지만 아직까지도 기계와 사람의 수행 과정에 초점을 맞추어 차이점을 드러내는 연구는 활성화되지 않았다. 본 연구는 자연어 이해 및 생성 태스크 수행 시 기계와 인간의 수행 과정 차이를 밝히고자 하였다. 이에 이해 태스크로는 문법성 판단, 생성 태스크로는 요약 태스크를 대상 태스크로 선정하였고, 기존 주류 사전학습 모델이었던 transformer 계열 모델과 LLM인 ChatGPT 3.5를 사용하여 실험을 진행하였다. 실험 결과 문법성 판단 시 기계들이 인간의 언어적 직관을 반영하지 못하는 양상을 발견하였고, 요약 태스크에서는 인간과 기계의 성능 판단 기준이 다름을 확인하였다.

  • PDF

Natural Language Query Framework on the Semantic Web

  • Kim, Jin-Sung
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • 2007.04a
    • /
    • pp.189-192
    • /
    • 2007
  • This study proposes a Natural Language Query Framework (NLQF) on the semantic web to support the intelligent deduction at semantic level. A large number of former researches are focused on the knowledge representation on the semantic web. However, to revitalize the intelligent agent (IA)-based automated e-business contract with human customers, there is a need for semantic level approach to the web information. To enable accessing web information at semantic level, this paper discusses the pattern of complex natural language processing at first, and then the semantic web-based natural language inference in e-business environment. The NL-based approach could help the IAs on the web to communicate with customers and other IAs with more natural interface than traditional HTML-based web information. Therefore, our proposed NLQF will be used in semantic web-based intelligent e-business contracts between customers and IAs.

  • PDF

Korean Nominal Bank, Using Language Resources of Sejong Project (세종계획 언어자원 기반 한국어 명사은행)

  • Kim, Dong-Sung
    • Language and Information
    • /
    • v.17 no.2
    • /
    • pp.67-91
    • /
    • 2013
  • This paper describes Korean Nominal Bank, a project that provides argument structure for instances of the predicative nouns in the Sejong parsed Corpus. We use the language resources of the Sejong project, so that the same set of data is annotated with more and more levels of annotation, since a new type of a language resource building project could bring new information of separate and isolated processing. We have based on the annotation scheme based on the Sejong electronic dictionary, semantically tagged corpus, and syntactically analyzed corpus. Our work also involves the deep linguistic knowledge of syntaxsemantic interface in general. We consider the semantic theories including the Frame Semantics of Fillmore (1976), argument structure of Grimshaw (1990) and argument alternation of Levin (1993), and Levin and Rappaport Hovav (2005). Various syntactic theories should be needed in explaining various sentence types, including empty categories, raising, left (or right dislocation). We also need an explanation on the idiosyncratic lexical feature, such as collocation and etc.

  • PDF