• 제목/요약/키워드: Neural Embedding Model

검색결과 79건 처리시간 0.022초

딥러닝 기반의 다범주 감성분석 모델 개발 (Development of Deep Learning Models for Multi-class Sentiment Analysis)

  • 알렉스 샤이코니;서상현;권영식
    • 한국IT서비스학회지
    • /
    • 제16권4호
    • /
    • pp.149-160
    • /
    • 2017
  • Sentiment analysis is the process of determining whether a piece of document, text or conversation is positive, negative, neural or other emotion. Sentiment analysis has been applied for several real-world applications, such as chatbot. In the last five years, the practical use of the chatbot has been prevailing in many field of industry. In the chatbot applications, to recognize the user emotion, sentiment analysis must be performed in advance in order to understand the intent of speakers. The specific emotion is more than describing positive or negative sentences. In light of this context, we propose deep learning models for conducting multi-class sentiment analysis for identifying speaker's emotion which is categorized to be joy, fear, guilt, sad, shame, disgust, and anger. Thus, we develop convolutional neural network (CNN), long short term memory (LSTM), and multi-layer neural network models, as deep neural networks models, for detecting emotion in a sentence. In addition, word embedding process was also applied in our research. In our experiments, we have found that long short term memory (LSTM) model performs best compared to convolutional neural networks and multi-layer neural networks. Moreover, we also show the practical applicability of the deep learning models to the sentiment analysis for chatbot.

양방향 순환신경망 임베딩을 이용한 리그오브레전드 승패 예측 (Predicting Win-Loss of League of Legends Using Bidirectional LSTM Embedding)

  • 김철기;이수원
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제9권2호
    • /
    • pp.61-68
    • /
    • 2020
  • e-sports는 최근 꾸준한 성장을 이루면서 세계적인 인기 스포츠 종목이 되었다. 본 논문에서는 e-sports의 대표적인 게임인 리그오브레전드 경기 시작 단계에서의 승패 예측 모델을 제안한다. 리그오브레전드에서는 챔피언이라고 불리는 게임 상의 유닛을 플레이어가 선택하여 플레이하게 되는데, 각 플레이어의 선택을 통하여 구성된 팀의 챔피언 능력치 조합은 승패에 영향을 미친다. 제안 모델은 별다른 도메인 지식 없이 플레이어 단위 챔피언 능력치를 팀 단위 챔피언 능력치로 임베딩한 Bidirectional LSTM 임베딩 기반 딥러닝 모델이다. 기존 분류 모델들과 비교 결과 팀 단위 챔피언 능력치 조합을 고려한 제안 모델에서 58.07%의 가장 높은 예측 정확도를 보였다.

Prediction of Wind Power by Chaos and BP Artificial Neural Networks Approach Based on Genetic Algorithm

  • Huang, Dai-Zheng;Gong, Ren-Xi;Gong, Shu
    • Journal of Electrical Engineering and Technology
    • /
    • 제10권1호
    • /
    • pp.41-46
    • /
    • 2015
  • It is very important to make accurate forecast of wind power because of its indispensable requirement for power system stable operation. The research is to predict wind power by chaos and BP artificial neural networks (CBPANNs) method based on genetic algorithm, and to evaluate feasibility of the method of predicting wind power. A description of the method is performed. Firstly, a calculation of the largest Lyapunov exponent of the time series of wind power and a judgment of whether wind power has chaotic behavior are made. Secondly, phase space of the time series is reconstructed. Finally, the prediction model is constructed based on the best embedding dimension and best delay time to approximate the uncertain function by which the wind power is forecasted. And then an optimization of the weights and thresholds of the model is conducted by genetic algorithm (GA). And a simulation of the method and an evaluation of its effectiveness are performed. The results show that the proposed method has more accuracy than that of BP artificial neural networks (BP-ANNs).

이미지 캡션 생성을 위한 심층 신경망 모델의 설계 (Design of a Deep Neural Network Model for Image Caption Generation)

  • 김동하;김인철
    • 정보처리학회논문지:소프트웨어 및 데이터공학
    • /
    • 제6권4호
    • /
    • pp.203-210
    • /
    • 2017
  • 본 논문에서는 이미지 캡션 생성과 모델 전이에 효과적인 심층 신경망 모델을 제시한다. 본 모델은 멀티 모달 순환 신경망 모델의 하나로서, 이미지로부터 시각 정보를 추출하는 컨볼루션 신경망 층, 각 단어를 저차원의 특징으로 변환하는 임베딩 층, 캡션 문장 구조를 학습하는 순환 신경망 층, 시각 정보와 언어 정보를 결합하는 멀티 모달 층 등 총 5 개의 계층들로 구성된다. 특히 본 모델에서는 시퀀스 패턴 학습과 모델 전이에 우수한 LSTM 유닛을 이용하여 순환 신경망 층을 구성하며, 캡션 문장 생성을 위한 매 순환 단계마다 이미지의 시각 정보를 이용할 수 있도록 컨볼루션 신경망 층의 출력을 순환 신경망 층의 초기 상태뿐만 아니라 멀티 모달 층의 입력에도 연결하는 구조를 가진다. Flickr8k, Flickr30k, MSCOCO 등의 공개 데이터 집합들을 이용한 다양한 비교 실험들을 통해, 캡션의 정확도와 모델 전이의 효과 면에서 본 논문에서 제시한 멀티 모달 순환 신경망 모델의 높은 성능을 확인할 수 있었다.

Korean Sentiment Analysis Using Natural Network: Based on IKEA Review Data

  • Sim, YuJeong;Yun, Dai Yeol;Hwang, Chi-gon;Moon, Seok-Jae
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권2호
    • /
    • pp.173-178
    • /
    • 2021
  • In this paper, we find a suitable methodology for Korean Sentiment Analysis through a comparative experiment in which methods of embedding and natural network models are learned at the highest accuracy and fastest speed. The embedding method compares word embeddeding and Word2Vec. The model compares and experiments representative neural network models CNN, RNN, LSTM, GRU, Bi-LSTM and Bi-GRU with IKEA review data. Experiments show that Word2Vec and BiGRU had the highest accuracy and second fastest speed with 94.23% accuracy and 42.30 seconds speed. Word2Vec and GRU were found to have the third highest accuracy and fastest speed with 92.53% accuracy and 26.75 seconds speed.

워드 임베딩을 활용한 한국어 가짜뉴스 탐지 모델에 관한 연구 (A Study on Korean Fake news Detection Model Using Word Embedding)

  • 심재승;이재준;정이태;안현철
    • 한국컴퓨터정보학회:학술대회논문집
    • /
    • 한국컴퓨터정보학회 2020년도 제62차 하계학술대회논문집 28권2호
    • /
    • pp.199-202
    • /
    • 2020
  • 본 논문에서는 가짜뉴스 탐지 모델에 워드 임베딩 기법을 접목하여 성능을 향상시키는 방법을 제안한다. 기존의 한국어 가짜뉴스 탐지 연구는 희소 표현인 빈도-역문서 빈도(TF-IDF)를 활용한 탐지 모델들이 주를 이루었다. 하지만 이는 가짜뉴스 탐지의 관점에서 뉴스의 언어적 특성을 파악하는 데 한계가 존재하는데, 특히 문맥에서 드러나는 언어적 특성을 구조적으로 반영하지 못한다. 이에 밀집 표현 기반의 워드 임베딩 기법인 Word2vec을 활용한 텍스트 전처리를 통해 문맥 정보까지 반영한 가짜뉴스 탐지 모델을 본 연구의 제안 모델로 생성한 후 TF-IDF 기반의 가짜뉴스 탐지 모델을 비교 모델로 생성하여 두 모델 간의 비교를 통한 성능 검증을 수행하였다. 그 결과 Word2vec 기반의 제안모형이 더욱 우수하였음을 확인하였다.

  • PDF

Word-Level Embedding to Improve Performance of Representative Spatio-temporal Document Classification

  • Byoungwook Kim;Hong-Jun Jang
    • Journal of Information Processing Systems
    • /
    • 제19권6호
    • /
    • pp.830-841
    • /
    • 2023
  • Tokenization is the process of segmenting the input text into smaller units of text, and it is a preprocessing task that is mainly performed to improve the efficiency of the machine learning process. Various tokenization methods have been proposed for application in the field of natural language processing, but studies have primarily focused on efficiently segmenting text. Few studies have been conducted on the Korean language to explore what tokenization methods are suitable for document classification task. In this paper, an exploratory study was performed to find the most suitable tokenization method to improve the performance of a representative spatio-temporal document classifier in Korean. For the experiment, a convolutional neural network model was used, and for the final performance comparison, tasks were selected for document classification where performance largely depends on the tokenization method. As a tokenization method for comparative experiments, commonly used Jamo, Character, and Word units were adopted. As a result of the experiment, it was confirmed that the tokenization of word units showed excellent performance in the case of representative spatio-temporal document classification task where the semantic embedding ability of the token itself is important.

Automatic extraction of similar poetry for study of literary texts: An experiment on Hindi poetry

  • Prakash, Amit;Singh, Niraj Kumar;Saha, Sujan Kumar
    • ETRI Journal
    • /
    • 제44권3호
    • /
    • pp.413-425
    • /
    • 2022
  • The study of literary texts is one of the earliest disciplines practiced around the globe. Poetry is artistic writing in which words are carefully chosen and arranged for their meaning, sound, and rhythm. Poetry usually has a broad and profound sense that makes it difficult to be interpreted even by humans. The essence of poetry is Rasa, which signifies mood or emotion. In this paper, we propose a poetry classification-based approach to automatically extract similar poems from a repository. Specifically, we perform a novel Rasa-based classification of Hindi poetry. For the task, we primarily used lexical features in a bag-of-words model trained using the support vector machine classifier. In the model, we employed Hindi WordNet, Latent Semantic Indexing, and Word2Vec-based neural word embedding. To extract the rich feature vectors, we prepared a repository containing 37 717 poems collected from various sources. We evaluated the performance of the system on a manually constructed dataset containing 945 Hindi poems. Experimental results demonstrated that the proposed model attained satisfactory performance.

Two-stage Deep Learning Model with LSTM-based Autoencoder and CNN for Crop Classification Using Multi-temporal Remote Sensing Images

  • Kwak, Geun-Ho;Park, No-Wook
    • 대한원격탐사학회지
    • /
    • 제37권4호
    • /
    • pp.719-731
    • /
    • 2021
  • This study proposes a two-stage hybrid classification model for crop classification using multi-temporal remote sensing images; the model combines feature embedding by using an autoencoder (AE) with a convolutional neural network (CNN) classifier to fully utilize features including informative temporal and spatial signatures. Long short-term memory (LSTM)-based AE (LAE) is fine-tuned using class label information to extract latent features that contain less noise and useful temporal signatures. The CNN classifier is then applied to effectively account for the spatial characteristics of the extracted latent features. A crop classification experiment with multi-temporal unmanned aerial vehicle images is conducted to illustrate the potential application of the proposed hybrid model. The classification performance of the proposed model is compared with various combinations of conventional deep learning models (CNN, LSTM, and convolutional LSTM) and different inputs (original multi-temporal images and features from stacked AE). From the crop classification experiment, the best classification accuracy was achieved by the proposed model that utilized the latent features by fine-tuned LAE as input for the CNN classifier. The latent features that contain useful temporal signatures and are less noisy could increase the class separability between crops with similar spectral signatures, thereby leading to superior classification accuracy. The experimental results demonstrate the importance of effective feature extraction and the potential of the proposed classification model for crop classification using multi-temporal remote sensing images.

BiLSTM 모델과 형태소 자질을 이용한 서술어 인식 방법 (Predicate Recognition Method using BiLSTM Model and Morpheme Features)

  • 남충현;장경식
    • 한국정보통신학회논문지
    • /
    • 제26권1호
    • /
    • pp.24-29
    • /
    • 2022
  • 정보 추출 및 질의응답 시스템 등 다양한 자연어 처리 분야에서 사용되는 의미역 결정은 주어진 문장과 서술어에 대해 서술어와 연관성 있는 논항들의 관계를 파악하는 작업이다. 입력으로 사용되는 서술어는 형태소 분석과 같은 어휘적 분석 결과를 이용하여 추출하지만, 한국어 특성상 문장의 의미에 따라 다양한 패턴을 가질 수 있기 때문에 모든 언어학적 패턴을 만들 수 없다는 문제점이 있다. 본 논문에서는 사전에 언어학적 패턴을 정의하지 않고 신경망 모델과 사전 학습된 임베딩 모델 및 형태소 자질을 추가한 한국어 서술어를 인식하는 방법을 제안한다. 실험은 모델의 변경 가능한 파라미터에 대한 성능 비교, 임베딩 모델과 형태소 자질의 사용 유무에 따른 성능 비교를 하였으며, 그 결과 제안한 신경망 모델이 92.63%의 성능을 보였음을 확인하였다.