• Title/Summary/Keyword: Korean parsing

Search Result 325, Processing Time 0.026 seconds

An SSD-Based Directory Parsing with the Counting Bloom Filter (카운팅 블룸필터를 이용한 SSD 기반의 디렉토리 탐색 기법)

  • Kim, Man-Yun;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2014.07a
    • /
    • pp.347-349
    • /
    • 2014
  • 데이터의 폭발적인 증가로 우리는 빅데이터 시대를 맞이하게 되었다. 빅데이터의 파일 시스템 내에는 아주 큰 트리구조로 이루어진 디렉토리와 파일이 무수히 존재한다. 이 커다란 트리구조에서 사용자가 요청하는 디렉토리와 파일을 탐색하는 것은 매우 어려운 작업이다. 이에 우리는 카운팅 블룸필터를 이용한 디렉토리 탐색 기법을 제시한다. SDP(SSD-based Directory Parsing)는 최근 또는 자주 액세스한 디렉토리와 파일의 메타데이터를 보관하는 SSD 기반의 캐시이다. 대규모 파일 시스템에서 사용자가 파일을 요청했을 때 파일 시스템은 저장 장치에 메타데이터를 검색하기 위해 여러 번 액세스한다. 이러한 비효율적인 SSD에 대한 액세스를 방지하기 위해 카운팅 블룸필터를 이용하여 메타데이터를 빠르고 효율적으로 검색하는 기법을 제시한다.

  • PDF

Korean Dependency Parsing Using Various Ensemble Models (다양한 앙상블 알고리즘을 이용한 한국어 의존 구문 분석)

  • Jo, Gyeong-Cheol;Kim, Ju-Wan;Kim, Gyun-Yeop;Park, Seong-Jin;Gang, Sang-U
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.543-545
    • /
    • 2019
  • 본 논문은 최신 한국어 의존 구문 분석 모델(Korean dependency parsing model)들과 다양한 앙상블 모델(ensemble model)들을 결합하여 그 성능을 분석한다. 단어 표현은 미리 학습된 워드 임베딩 모델(word embedding model)과 ELMo(Embedding from Language Model), Bert(Bidirectional Encoder Representations from Transformer) 그리고 다양한 추가 자질들을 사용한다. 또한 사용된 의존 구문 분석 모델로는 Stack Pointer Network Model, Deep Biaffine Attention Parser와 Left to Right Pointer Parser를 이용한다. 최종적으로 각 모델의 분석 결과를 앙상블 모델인 Bagging 기법과 XGBoost(Extreme Gradient Boosting) 이용하여 최적의 모델을 제안한다.

  • PDF

(Prediction of reduction goals : deterministic approach) (리덕션 골의 예상: 결정적인 접근 방법)

  • 이경옥
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.461-465
    • /
    • 2003
  • The technique of reduction goal prediction in LR parsing has several applications such as the computation of right context. An LR parser generating the set of pre-determined reduction goals was previously suggested. The set approach is nondeterministic, and so it is inappropriate in some applications. This paper suggests a deterministic technique to give a uniquely predictable reduction symbol.

Improved Deep Biaffine Attention for Korean Dependency Parsing (한국어 의존 구문 분석을 위한 개선된 Deep Biaffine Attention)

  • O, Dongsuk;Woo, Jongseong;Lee, Byungwoo;Kim, Kyungsun
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.608-610
    • /
    • 2018
  • 한국어 의존 구문 분석(Dependency Parsing)은 문장 어절의 중심어(head)와 수식어(modifier)의 의존관계를 표현하는 자연어 분석 방법이다. 최근에는 이러한 의존 관계를 표현하기 위해 주의 집중 메커니즘(Attention Mechanism)과 LSTM(Long Short Term Memory)을 결합한 모델들이 높은 성능을 보이고 있다. 본 논문에서는 개선된 Biaffine Attention 의존 구문 분석 모델을 제안한다. 제안된 모델은 기존의 Biaffine Attention에서 의존성과 의존 관계를 결정하는 방법을 개선하였고, 한국어 의존 구문 분석을 위한 입력 열의 형태소 표상을 확장함으로써 기존의 모델보다 UAS(Unlabeled Attachment Score)가 0.15%p 더 높은 성능을 보였다.

  • PDF

A Study of Main Contents Extraction from Web News Pages based on XPath Analysis

  • Sun, Bok-Keun
    • Journal of the Korea Society of Computer and Information
    • /
    • v.20 no.7
    • /
    • pp.1-7
    • /
    • 2015
  • Although data on the internet can be used in various fields such as source of data of IR(Information Retrieval), Data mining and knowledge information servece, and contains a lot of unnecessary information. The removal of the unnecessary data is a problem to be solved prior to the study of the knowledge-based information service that is based on the data of the web page, in this paper, we solve the problem through the implementation of XTractor(XPath Extractor). Since XPath is used to navigate the attribute data and the data elements in the XML document, the XPath analysis to be carried out through the XTractor. XTractor Extracts main text by html parsing, XPath grouping and detecting the XPath contains the main data. The result, the recognition and precision rate are showed in 97.9%, 93.9%, except for a few cases in a large amount of experimental data and it was confirmed that it is possible to properly extract the main text of the news.

Prediction of Prosodic Boundaries Using Dependency Relation

  • Kim, Yeon-Jun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4E
    • /
    • pp.26-30
    • /
    • 1999
  • This paper introduces a prosodic phrasing method in Korean to improve the naturalness of speech synthesis, especially in text-to-speech conversion. In prosodic phrasing, it is necessary to understand the structure of a sentence through a language processing procedure, such as part-of-speech (POS) tagging and parsing, since syntactic structure correlates better with the prosodic structure of speech than with other factors. In this paper, the prosodic phrasing procedure is treated from two perspectives: dependency parsing and prosodic phrasing using dependency relations. This is appropriate for Ural-Altaic, since a prosodic boundary in speech usually concurs with a governor of dependency relation. From experimental results, using the proposed method achieved 12% improvement in prosody boundary prediction accuracy with a speech corpus consisting 300 sentences uttered by 3 speakers.

  • PDF

UML diagram-driven test scenarios generation based on the temporal graph grammar

  • Shi, Zhan;Zeng, Xiaoqin;Zhang, Tingting;Han, Lei;Qian, Ying
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.7
    • /
    • pp.2476-2495
    • /
    • 2021
  • Model-based software architecture verification and test scenarios generation are becoming more and more important in the software industry. Based on the existing temporal graph grammar, this paper proposes a new formalization method of the context-sensitive graph grammar for aiming at UML activity diagrams, which is called the UML Activity Graph Grammar, or UAGG. In the UAGG, there are new definitions and parsing algorithms. The proposed mechanisms are able to not only check the structural correctness of the UML activity diagram but also automatically generate the test scenario according to user constraints. Finally, a case study is discussed to illustrate how the UAGG and its algorithms work.

Range Detection of Wa/Kwa Parallel Noun Phrase using a Probabilistic Model and Modification Information (확률모형과 수식정보를 이용한 와/과 병렬사구 범위결정)

  • Choi, Yong-Seok;Shin, Ji-Ae;Choi, Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.2
    • /
    • pp.128-136
    • /
    • 2008
  • Recognition of parallel structure at early stage of sentence parsing can reduce the complexity of parsing. In this paper, we propose an unsupervised language-independent probabilistic model for recongition of parallel noun structures. The proposed model is based on the idea of swapping constituents, which replies the properties of symmetry (two or more identical constituents are repeated) and of reversibility (the order of constituents is inter-changeable) in parallel structures. The non-symmetric patterns that cannot be captured by the general symmetry rule are resolved additionally by the modifier information. In particular this paper shows how the proposed model is applied to recognize Korean parallel noun phrases connected by "wa/kwa" particle. Our model is compared with other models including supervised models and performs better on recongition of parallel noun phrases.

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique (미국 특허 서지정보 추출 방법에 대한 연구: HTML 파싱 기법의 활용을 중심으로)

  • Han, Yoo-Jin;Oh, Seung-Woo
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.7-20
    • /
    • 2010
  • This study aims to provide a method of extracting the most recent information on US patent documents. An HTML paring technique that can directly connect to the US Patent and Trademark Office (USPTO) Web page is adopted. After obtaining a list of 50 documents through a keyword searching method, this study suggested an algorithm, using HTML parsing techniques, which can extract a patent number, an applicant, and the US patent class information. The study also revealed an algorithm by which we can extract both patents and subsequent patents using their closely connected relationship, that is a very distinctive characteristic of US patent documents. Although the proposed method has several limitations, it can supplement existing databases effectively in terms of timeliness and comprehensiveness.

Development of Collaborative Script Analysis Platform Based on Web for Information Retrieval Related to Story (스토리 정보의 검색을 위한 웹 기반의 협업적 스크립트 분석 플랫폼 개발)

  • Park, Seung-Bo;Kim, Hyun-Sik;Baek, Yeong-Tae;You, Eun-Soon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.19 no.9
    • /
    • pp.93-101
    • /
    • 2014
  • Movie stories can be retrieved efficiently by analyzing a script, which is a blueprint of the movie. Although the movie script is described in the formatted structure of Final Draft, it is hard to restore the type without analyzing the story of the sentences since the scripts open on the website are mostly broken. For this purpose, it is necessary to develop and provide the web-based script analysis software so that users collaboratively and freely check and correct the errors in the results after automatically parsing the script. Hence, in this paper we suggest the structure of the web-based collaborative script analysis platform that enables users to modify and filter the type error of the script for high level of film data accumulation and performance evaluation for the implementation results is conducted. Through the experiment, accuracy of automatically parsing appears to be 64.95% and performance of modification by collaboration showed 99.58% of accuracy of parsing with errors mostly corrected after passing through 5 steps of modification.