• Title/Summary/Keyword: 자연어 처리 연구

Search Result 640, Processing Time 0.032 seconds

Korean Named Entity Recognition and Classification using Word Embedding Features (Word Embedding 자질을 이용한 한국어 개체명 인식 및 분류)

  • Choi, Yunsu;Cha, Jeongwon
    • Journal of KIISE
    • /
    • v.43 no.6
    • /
    • pp.678-685
    • /
    • 2016
  • Named Entity Recognition and Classification (NERC) is a task for recognition and classification of named entities such as a person's name, location, and organization. There have been various studies carried out on Korean NERC, but they have some problems, for example lacking some features as compared with English NERC. In this paper, we propose a method that uses word embedding as features for Korean NERC. We generate a word vector using a Continuous-Bag-of-Word (CBOW) model from POS-tagged corpus, and a word cluster symbol using a K-means algorithm from a word vector. We use the word vector and word cluster symbol as word embedding features in Conditional Random Fields (CRFs). From the result of the experiment, performance improved 1.17%, 0.61% and 1.19% respectively for TV domain, Sports domain and IT domain over the baseline system. Showing better performance than other NERC systems, we demonstrate the effectiveness and efficiency of the proposed method.

The automatic Lexical Knowledge acquisition using morpheme information and Clustering techniques (어절 내 형태소 출현 정보와 클러스터링 기법을 이용한 어휘지식 자동 획득)

  • Yu, Won-Hee;Suh, Tae-Won;Lim, Heui-Seok
    • The Journal of Korean Association of Computer Education
    • /
    • v.13 no.1
    • /
    • pp.65-73
    • /
    • 2010
  • This study offered lexical knowledge acquisition model of unsupervised learning method in order to overcome limitation of lexical knowledge hand building manual of supervised learning method for research of natural language processing. The offered model obtains the lexical knowledge from the lexical entry which was given by inputting through the process of vectorization, clustering, lexical knowledge acquisition automatically. In the process of obtaining the lexical knowledge acquisition of model, some parts of lexical knowledge dictionary which changes in the number of lexical knowledge and characteristics of lexical knowledge appeared by parameter changes were shown. The experimental results show that is possibility of automatic building of Machine-readable dictionary, because observed to the number of lexical class information cluster collected constant. also building of lexical ditionary including left-morphosyntactic information and right-morphosyntactic information is reflected korean characteristic.

  • PDF

A study on integrating and discovery of semantic based knowledge model (의미 기반의 지식모델 통합과 탐색에 관한 연구)

  • Chun, Seung-Su
    • Journal of Internet Computing and Services
    • /
    • v.15 no.6
    • /
    • pp.99-106
    • /
    • 2014
  • Generation and analysis methods have been proposed in recent years, such as using a natural language and formal language processing, artificial intelligence algorithms based knowledge model is effective meaning. its semantic based knowledge model has been used effective decision making tree and problem solving about specific context. and it was based on static generation and regression analysis, trend analysis with behavioral model, simulation support for macroeconomic forecasting mode on especially in a variety of complex systems and social network analysis. In this study, in this sense, integrating knowledge-based models, This paper propose a text mining derived from the inter-Topic model Integrated formal methods and Algorithms. First, a method for converting automatically knowledge map is derived from text mining keyword map and integrate it into the semantic knowledge model for this purpose. This paper propose an algorithm to derive a method of projecting a significant topic map from the map and the keyword semantically equivalent model. Integrated semantic-based knowledge model is available.

Analyzing Correlations between Movie Characters Based on Deep Learning

  • Jin, Kyo Jun;Kim, Jong Wook
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.9-17
    • /
    • 2021
  • Humans are social animals that have gained information or social interaction through dialogue. In conversation, the mood of the word can change depending on the sensibility of one person to another. Relationships between characters in films are essential for understanding stories and lines between characters, but methods to extract this information from films have not been investigated. Therefore, we need a model that automatically analyzes the relationship aspects in the movie. In this paper, we propose a method to analyze the relationship between characters in the movie by utilizing deep learning techniques to measure the emotion of each character pair. The proposed method first extracts main characters from the movie script and finds the dialogue between the main characters. Then, to analyze the relationship between the main characters, it performs a sentiment analysis, weights them according to the positions of the metabolites in the entire time intervals and gathers their scores. Experimental results with real data sets demonstrate that the proposed scheme is able to effectively measure the emotional relationship between the main characters.

A study on the Extraction of Similar Information using Knowledge Base Embedding for Battlefield Awareness

  • Kim, Sang-Min;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.11
    • /
    • pp.33-40
    • /
    • 2021
  • Due to advanced complex strategies, the complexity of information that a commander must analyze is increasing. An intelligent service that can analyze battlefield is needed for the commander's timely judgment. This service consists of extracting knowledge from battlefield information, building a knowledge base, and analyzing the battlefield information from the knowledge base. This paper extract information similar to an input query by embedding the knowledge base built in the 2nd step. The transformation model is needed to generate the embedded knowledge base and uses the random-walk algorithm. The transformed information is embedding using Word2Vec, and Similar information is extracted through cosine similarity. In this paper, 980 sentences are generated from the open knowledge base and embedded as a 100-dimensional vector and it was confirmed that similar entities were extracted through cosine similarity.

A Study on the Improvement Model of Document Retrieval Efficiency of Tax Judgment (조세심판 문서 검색 효율 향상 모델에 관한 연구)

  • Lee, Hoo-Young;Park, Koo-Rack;Kim, Dong-Hyun
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.6
    • /
    • pp.41-47
    • /
    • 2019
  • It is very important to search for and obtain an example of a similar judgment in case of court judgment. The existing judge's document search uses a method of searching through key-words entered by the user. However, if it is necessary to input an accurate keyword and the keyword is unknown, it is impossible to search for the necessary document. In addition, the detected document may have different contents. In this paper, we want to improve the effectiveness of the method of vectorizing a document into a three-dimensional space, calculating cosine similarity, and searching close documents in order to search an accurate judge's example. Therefore, after analyzing the similarity of words used in the judge's example, a method is provided for extracting the mode and inserting it into the text of the text, thereby providing a method for improving the cosine similarity of the document to be retrieved. It is hoped that users will be able to provide a fast, accurate search trying to find an example of a tax-related judge through the proposed model.

Introduction and Analysis of Open Source Software Development Methodology (오픈소스 SW 개발 방법론 소개 및 분석)

  • Son, Kyung A;Yun, Young-Sun
    • Journal of Software Assessment and Valuation
    • /
    • v.16 no.2
    • /
    • pp.163-172
    • /
    • 2020
  • Recently, concepts of the Fourth Industrial Revolution technologies such as artificial intelligence, big data, and cloud computing have been introduced and the limits of individual or team development policies are being reviewed. Also, a lot of latest technology source codes have been opened to the public, and related studies are being conducted based on them. Meanwhile, the company is applying the strengths of the open source software development methodology to proprietary software development, and publicly announcing support for open source development methodology. In this paper, we introduced several software development methodology such as open source model, inner source model, and the similar DevOps model, which have been actively discussed recently, and compared their characteristics and components. Rather than claiming the excellence of a specific model, we argue that if the software development policy of an individual or affiliated organization is established according to each benefit, they will be able to achieve software quality improvement while satisfying customer requirements.

A Study on Lightweight Transformer Based Super Resolution Model Using Knowledge Distillation (지식 증류 기법을 사용한 트랜스포머 기반 초해상화 모델 경량화 연구)

  • Dong-hyun Kim;Dong-hun Lee;Aro Kim;Vani Priyanka Galia;Sang-hyo Park
    • Journal of Broadcast Engineering
    • /
    • v.28 no.3
    • /
    • pp.333-336
    • /
    • 2023
  • Recently, the transformer model used in natural language processing is also applied to the image super resolution field, showing good performance. However, these transformer based models have a disadvantage that they are difficult to use in small mobile devices because they are complex and have many learning parameters and require high hardware resources. Therefore, in this paper, we propose a knowledge distillation technique that can effectively reduce the size of a transformer based super resolution model. As a result of the experiment, it was confirmed that by applying the proposed technique to the student model with reduced number of transformer blocks, performance similar to or higher than that of the teacher model could be obtained.

Verification of educational goal of reading area in Korean SAT through natural language processing techniques (대학수학능력시험 독서 영역의 교육 목표를 위한 자연어처리 기법을 통한 검증)

  • Lee, Soomin;Kim, Gyeongmin;Lim, Heuiseok
    • Journal of the Korea Convergence Society
    • /
    • v.13 no.1
    • /
    • pp.81-88
    • /
    • 2022
  • The major educational goal of reading part, which occupies important portion in Korean language in Korean SAT, is to evaluated whether a given text can be fully understood. Therefore given questions in the exam must be able to solely solvable by given text. In this paper we developed a datatset based on Korean SAT's reading part in order to evaluate whether a deep learning language model can classify if the given question is true or false, which is a binary classification task in NLP. In result, by applying language model solely according to the passages in the dataset, we were able to acquire better performance than 59.2% in F1 score for human performance in most of language models, that KoELECTRA scored 62.49% in our experiment. Also we proved that structural limit of language models can be eased by adjusting data preprocess.

Comparison of Fault Diagnosis Accuracy Between XGBoost and Conv1D Using Long-Term Operation Data of Ship Fuel Supply Instruments (선박 연료 공급 기기류의 장시간 운전 데이터의 고장 진단에 있어서 XGBoost 및 Conv1D의 예측 정확성 비교)

  • Hyung-Jin Kim;Kwang-Sik Kim;Se-Yun Hwang;Jang-Hyun Lee
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2022.06a
    • /
    • pp.110-110
    • /
    • 2022
  • 본 연구는 자율운항 선박의 원격 고장 진단 기법 개발의 일부로 수행되었다. 특히, 엔진 연료 계통 장비로부터 계측된 시계열 데이터로부터 상태 진단을 위한 알고리즘 구현 결과를 제시하였다. 엔진 연료 펌프와 청정기를 가진 육상 실험 장비로부터 진동 시계열 데이터 계측하였으며, 이상 감지, 고장 분류 및 고장 예측이 가능한 심층 학습(Deep Learning) 및 기계 학습(Machine Learning) 알고리즘을 구현하였다. 육상 실험 장비에 고장 유형 별로 인위적인 고장을 발생시켜 특징적인 진동 신호를 계측하여, 인공 지능 학습에 이용하였다. 계측된 신호 데이터는 선행 발생한 사건의 신호가 후행 사건에 영향을 미치는 특성을 가지고 있으므로, 시계열에 내포된 고장 상태는 시간 간의 선후 종속성을 반영할 수 있는 학습 알고리즘을 제시하였다. 고장 사건의 시간 종속성을 반영할 수 있도록 순환(Recurrent) 계열의 RNN(Recurrent Neural Networks), LSTM(Long Short-Term Memory models)의 모델과 합성곱 연산 (Convolution Neural Network)을 기반으로 하는 Conv1D 모델을 적용하여 예측 정확성을 비교하였다. 특히, 합성곱 계열의 RNN LSTM 모델이 고차원의 순차적 자연어 언어 처리에 장점을 보이는 모델임을 착안하여, 신호의 시간 종속성을 학습에 반영할 수 있는 합성곱 계열의 Conv1 알고리즘을 고장 예측에 사용하였다. 또한 기계 학습 모델의 효율성을 감안하여 XGBoost를 추가로 적용하여 고장 예측을 시도하였다. 최종적으로 연료 펌프와 청정기의 진동 신호로부터 Conv1D 모델과 XGBoost 모델의 고장 예측 성능 결과를 비교하였다

  • PDF