• Title/Summary/Keyword: 단어 분리

Search Result 112, Processing Time 0.031 seconds

Temporal Relationship Extraction for Natural Language Texts by Using Deep Bidirectional Language Model (양방향 언어 모델을 활용한 자연어 텍스트의 시간 관계정보 추출 기법)

  • Lim, Chae-Gyun;Choi, Ho-Jin
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.81-84
    • /
    • 2019
  • 자연어 문장으로 작성된 문서들에는 대체적으로 시간에 관련된 정보가 포함되어 있을 뿐만 아니라, 문서의 전체 내용과 문맥을 이해하기 위해서 이러한 정보를 정확하게 인식하는 것이 중요하다. 주어진 문서 내에서 시간 정보를 발견하기 위한 작업으로는 시간적인 표현(time expression) 자체를 인식하거나, 시간 표현과 연관성이 있는 사건(event)을 찾거나, 시간 표현 또는 사건 간에서 발생하는 시간적 연관 관계(temporal relationship)를 추출하는 것이 있다. 문서에 사용된 언어에 따라 고유한 언어적 특성이 다르기 때문에, 만약 시간 정보에 대한 관계성을 고려하지 않는다면 주어진 문장들로부터 모든 시간 정보를 추출해내는 것은 상당히 어려운 일이다. 본 논문에서는, 양방향 구조로 학습된 심층 신경망 기반 언어 모델을 활용하여 한국어 입력문장들로부터 시간 정보를 발견하는 작업 중 하나인 시간 관계정보를 추출하는 기법을 제안한다. 이 기법은 주어진 단일 문장을 개별 단어 토큰들로 분리하여 임베딩 벡터로 변환하며, 각 토큰들의 잠재적 정보를 고려하여 문장 내에 어떤 유형의 시간 관계정보가 존재하는지를 인식하도록 학습시킨다. 또한, 한국어 시간 정보 주석 말뭉치를 활용한 실험을 수행하여 제안 기법의 시간 관계정보 인식 정확도를 확인한다.

  • PDF

The Bi-Cross Pretraining Method to Enhance Language Representation (Bi-Cross 사전 학습을 통한 자연어 이해 성능 향상)

  • Kim, Sung-ju;Kim, Seonhoon;Park, Jinseong;Yoo, Kang Min;Kang, Inho
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.320-325
    • /
    • 2021
  • BERT는 사전 학습 단계에서 다음 문장 예측 문제와 마스킹된 단어에 대한 예측 문제를 학습하여 여러 자연어 다운스트림 태스크에서 높은 성능을 보였다. 본 연구에서는 BERT의 사전 학습 문제 중 다음 문장 예측 문제에 대해 주목했다. 다음 문장 예측 문제는 자연어 추론 문제와 질의 응답 문제와 같이 임의의 두 문장 사이의 관계를 모델링하는 문제들에 성능 향상을 위해 사용되었다. 하지만 BERT의 다음 문장 예측 문제는 두 문장을 특수 토큰으로 분리하여 단일 문자열 형태로 모델에 입력으로 주어지는 cross-encoding 방식만을 학습하기 때문에 문장을 각각 인코딩하는 bi-encoding 방식의 다운스트림 태스크를 고려하지 않은 점에서 아쉬움이 있다. 본 논문에서는 기존 BERT의 다음 문장 예측 문제를 확장하여 bi-encoding 방식의 다음 문장 예측 문제를 추가적으로 사전 학습하여 단일 문장 분류 문제와 문장 임베딩을 활용하는 문제에서 성능을 향상 시키는 Bi-Cross 사전 학습 기법을 소개한다. Bi-Cross 학습 기법은 영화 리뷰 감성 분류 데이터 셋인 NSMC 데이터 셋에 대해 학습 데이터의 0.1%만 사용하는 학습 환경에서 Bi-Cross 사전 학습 기법 적용 전 모델 대비 5점 가량의 성능 향상이 있었다. 또한 KorSTS의 bi-encoding 방식의 문장 임베딩 성능 평가에서 Bi-Cross 사전 학습 기법 적용 전 모델 대비 1.5점의 성능 향상을 보였다.

  • PDF

A Study on the Development of Visual Arts Convergence Education Model with the Formless Concept (비정형 개념에 따른 시각예술 융합교육 모형 개발)

  • Cho, Hyun Geun
    • Korea Science and Art Forum
    • /
    • v.37 no.2
    • /
    • pp.275-292
    • /
    • 2019
  • This study was initiated with the attention of demanding new and diverse approaches, we're talking familiar with imitations in the design process like a way to draw a image. So I studied a convergence of humanities and visual arts with the understanding and conceptual approach of the formless. The purpose of this study is to develop formless languages and to organize practical courses which are to enable deeper research and design expression on theoretical approaches and explanations of outcomes required before and after the process when we practice in connection with the formless. The method of this study is to draw detailed items from selected words through advanced researches, work and author researches and practice teaching. The results of the study I proposed the formless language that is related to the horizontality in spatial positioning system, and pulse in the separation of space and time, and entropy in structural orders of the system, and base materialism in the limitation of matter as the operating mechanism and parent item of formless. And those elements are related with shape, size, shading, color, texture, space, structure as visual elements of formative elements and those have various adjectival meanings as the subordinate concept. So I presented an education materials of basic design which is to enable understanding and expressing the formless language in the overall process of formless visual art(theoretical approach, practice course, presentation, etc.). Based on these study results, I hope that this educational materials will be used as educational contents that makes them express and understand different new beauties, and a role that reveals social identity, and a reference for research on a formless visual arts.

A Study on the Construction of the Automatic Summaries - on the basis of Straight News in the Web - (자동요약시스템 구축에 대한 연구 - 웹 상의 보도기사를 중심으로 -)

  • Lee, Tae-Young
    • Journal of the Korean Society for information Management
    • /
    • v.23 no.4 s.62
    • /
    • pp.41-67
    • /
    • 2006
  • The writings frame and various rules based on discourse structure and knowledge-based methods were applied to construct the automatic Ext/sums (extracts & summaries) system from the straight news in web. The frame contains the slot and facet represented by the role of paragraphs, sentences , and clauses in news and the rules determining the type of slot. Rearrangement like Unification, separation, and synthesis of the candidate sentences to summary, maintaining the coherence of meanings, was carried out by using the rules derived from similar degree measurement, syntactic information, discourse structure, and knowledge-based methods and the context plots defined with the syntactic/semantic signature of noun and verb and category of verb suffix. The critic sentence were tried to insert into summary.

Reduction of Computing Time in Aircraft Control by Delta Operating Singular Perturbation Technique (델타연산자 섭동방법에 의한 항공기 동력학의 연산시간 감소)

  • Sim, Gyu Hong;Sa, Wan
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.31 no.3
    • /
    • pp.39-49
    • /
    • 2003
  • The delta operator approach and the singular perturbation technique are introduced. The former reduces the round-off error in the numerical computation. The latter reduces computing time by decoupling the original system into the fast and slow sub-systems. The aircraft dynamics consists of the Phugoid and short-period motions whether its model is longitudinal or lateral. In this paper, an approximated solutions of lateral dynamic model of Beaver obtained by using those two methods in compared with the exact solution. For open-loop system and closed-loop system, and approximated solution gets identical to the exact solution with only one iteration and without iteration, respectively. Therefore, it is shown that implementing those approaches is very effective in the flight dynamic and control.

On-Line Korean Character Recognition by the Stroke Information of Korean Phoneme in Multimedia Terminal (한글 자소의 획 정보에 의한 멀티미디어 단말기에서의 온라인 한글 문자 인식)

  • Oh Juntaek;Jung Momoon;Lee Woobeom;Kim Wookhyun
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.1 no.1
    • /
    • pp.64-73
    • /
    • 2000
  • The Korean character recognition technology for user interface in multimedia terminal requires fast processing time and high recognition rate. In this paper, we propose an phoneme and character recognition technology which uses characteristic information of korean and features of input strokes, i.e, feature point, feature vector, virtual vector, position relation between strokes. And, a recognition both phoneme and character by the various writing types of users uses korean database. The Korean database has been constructed by the characteristic information of korean and phoneme models which have various stroke information. Also, we use successive processing by the position relation between strokes and backtracking processing by the modification processing of stroke numbers which composed of each phoneme. This method reduces the complex processing of phoneme separation. The proposed on-line korean character recognition system has obtained 13msec average character processing time and correct recognition rate more than $95{\%}$ In a recognition experiment, where we tested 600 characters written by 10 people among 1,200 words.

  • PDF

TREATMENT OF DOUBLE TOOTH IN MANDIBULAR LATERAL INCISORS (하악 영구 측절치 Double tooth의 치험례)

  • Kim, Sang-Bae;Lee, Kwang-Soo
    • Journal of the korean academy of Pediatric Dentistry
    • /
    • v.27 no.3
    • /
    • pp.383-387
    • /
    • 2000
  • Fusion is defined as union of two separate tooth buds at some stage in their development with confluence of dentin and characterized by separate root canal and large single crown, while gemination is defined as an attempt of the single tooth bud to incompletely divide and usually result in a single root with one root canal and two completely or incompletely separated crowns. It is sometimes difficult to decide whether an abnormally large tooth is the result of fusion of a normal and a supernumerary tooth, or of gemination; use of the term 'Double tooth' may make the clinicians avoid this difficulty(Brook & Winter). Commonly there are no symptoms, but the problems associated with these anomalies include esthetics, possible loss of arch length and delayed or ectopic eruption of the permanent teeth, caries along the line of demarcation, and periodontal disease. Commonly, it dose not need to be treated in primary dentition but in case of permanent dentition, it may be requested to be treated due to esthetics and other problems. In our case, a 8 years old girl showed a Double tooth, we attained the favorable results by performing hemisection with apexification.

  • PDF

High-Speed Korean Address Searching System for Efficient Delivery Point Code Generation (효율적인 순로코드 발생을 위한 고속 한글 주소검색 시스템 개발)

  • Kim, Gyeong-Hwan;Lee, Seok-Goo;Shin, Mi-Young;Nam, Yun-Seok
    • The KIPS Transactions:PartD
    • /
    • v.8D no.3
    • /
    • pp.273-284
    • /
    • 2001
  • A systematic approach for interpreting Korean addresses based on postal code is presented in this paper. The implementation is focused on producing the final delivery point code from various types of address recognized. There are two stages in the address interpretation : 1) agreement verification between the recognized postal code and upper part of the address and 2) analysis of lower part of the address. In the agreement verification procedure, the recognized postal code is used as the key to the address dictionary and each of the retrieved addresses is compared with the words in the recognized address. As the result, the boundary between the upper part and the lower part is located. The confusion matrix, which is introduced to correct possible mis-recognized characters, is applied to improve the performance of the process. In the procedure for interpreting the lower part address, a delivery code is assigned using the house number and/or the building name. Several rules for the interpretation have been developed based on the real addresses collected. Experiments have been performed to evaluate the proposed approach using addresses collected from Kwangju and Pusan areas.

  • PDF

Enhancing Korean Alphabet Unit Speech Recognition with Neural Network-Based Alphabet Merging Methodology (한국어 자모단위 음성인식 결과 후보정을 위한 신경망 기반 자모 병합 방법론)

  • Solee Im;Wonjun Lee;Gary Geunbae Lee;Yunsu Kim
    • Annual Conference on Human and Language Technology
    • /
    • 2023.10a
    • /
    • pp.659-663
    • /
    • 2023
  • 이 논문은 한국어 음성인식 성능을 개선하고자 기존 음성인식 과정을 자모단위 음성인식 모델과 신경망 기반 자모 병합 모델 총 두 단계로 구성하였다. 한국어는 조합어 특성상 음성 인식에 필요한 음절 단위가 약 2900자에 이른다. 이는 학습 데이터셋에 자주 등장하지 않는 음절에 대해서 음성인식 성능을 저하시키고, 학습 비용을 높이는 단점이 있다. 이를 개선하고자 음절 단위의 인식이 아닌 51가지 자모 단위(ㄱ-ㅎ, ㅏ-ㅞ)의 음성인식을 수행한 후 자모 단위 인식 결과를 음절단위의 한글로 병합하는 과정을 수행할 수 있다[1]. 자모단위 인식결과는 초성, 중성, 종성을 고려하면 규칙 기반의 병합이 가능하다. 하지만 음성인식 결과에 잘못인식된 자모가 포함되어 있다면 최종 병합 결과에 오류를 생성하고 만다. 이를 해결하고자 신경망 기반의 자모 병합 모델을 제시한다. 자모 병합 모델은 분리되어 있는 자모단위의 입력을 완성된 한글 문장으로 변환하는 작업을 수행하고, 이 과정에서 음성인식 결과로 잘못인식된 자모에 대해서도 올바른 한글 문장으로 변환하는 오류 수정이 가능하다. 본 연구는 한국어 음성인식 말뭉치 KsponSpeech를 활용하여 실험을 진행하였고, 음성인식 모델로 Wav2Vec2.0 모델을 활용하였다. 기존 규칙 기반의 자모 병합 방법에 비해 제시하는 자모 병합 모델이 상대적 음절단위오류율(Character Error Rate, CER) 17.2% 와 단어단위오류율(Word Error Rate, WER) 13.1% 향상을 확인할 수 있었다.

  • PDF

A Method of Analyzing Sentiment Polarity of Multilingual Social Media: A Case of Korean-Chinese Languages (다국어 소셜미디어에 대한 감성분석 방법 개발: 한국어-중국어를 중심으로)

  • Cui, Meina;Jin, Yoonsun;Kwon, Ohbyung
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.91-111
    • /
    • 2016
  • It is crucial for the social media based marketing practices to perform sentiment analyze the unstructured data written by the potential consumers of their products and services. In particular, when it comes to the companies which are interested in global business, the companies must collect and analyze the data from the social media of multinational settings (e.g. Youtube, Instagram, etc.). In this case, since the texts are multilingual, they usually translate the sentences into a certain target language before conducting sentiment analysis. However, due to the lack of cultural differences and highly qualified data dictionary, translated sentences suffer from misunderstanding the true meaning. These result in decreasing the quality of sentiment analysis. Hence, this study aims to propose a method to perform a multilingual sentiment analysis, focusing on Korean-Chinese cases, while avoiding language translations. To show the feasibility of the idea proposed in this paper, we compare the performance of the proposed method with those of the legacy methods which adopt language translators. The results suggest that our method outperforms in terms of RMSE, and can be applied by the global business institutions.