• Title/Summary/Keyword: Parsing Method

Search Result 150, Processing Time 0.028 seconds

An Extraction Method of Bibliographic Information from the US Patents: Using an HTML Parsing Technique (미국 특허 서지정보 추출 방법에 대한 연구: HTML 파싱 기법의 활용을 중심으로)

  • Han, Yoo-Jin;Oh, Seung-Woo
    • Journal of the Korean Society for information Management
    • /
    • v.27 no.2
    • /
    • pp.7-20
    • /
    • 2010
  • This study aims to provide a method of extracting the most recent information on US patent documents. An HTML paring technique that can directly connect to the US Patent and Trademark Office (USPTO) Web page is adopted. After obtaining a list of 50 documents through a keyword searching method, this study suggested an algorithm, using HTML parsing techniques, which can extract a patent number, an applicant, and the US patent class information. The study also revealed an algorithm by which we can extract both patents and subsequent patents using their closely connected relationship, that is a very distinctive characteristic of US patent documents. Although the proposed method has several limitations, it can supplement existing databases effectively in terms of timeliness and comprehensiveness.

Prediction of Prosodic Boundaries Using Dependency Relation

  • Kim, Yeon-Jun;Oh, Yung-Hwan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.4E
    • /
    • pp.26-30
    • /
    • 1999
  • This paper introduces a prosodic phrasing method in Korean to improve the naturalness of speech synthesis, especially in text-to-speech conversion. In prosodic phrasing, it is necessary to understand the structure of a sentence through a language processing procedure, such as part-of-speech (POS) tagging and parsing, since syntactic structure correlates better with the prosodic structure of speech than with other factors. In this paper, the prosodic phrasing procedure is treated from two perspectives: dependency parsing and prosodic phrasing using dependency relations. This is appropriate for Ural-Altaic, since a prosodic boundary in speech usually concurs with a governor of dependency relation. From experimental results, using the proposed method achieved 12% improvement in prosody boundary prediction accuracy with a speech corpus consisting 300 sentences uttered by 3 speakers.

  • PDF

Segmentation of Long Chinese Sentences using Comma Classification (쉼표의 자동분류에 따른 중국에 장문분할)

  • Jin Me-Ixun;Kim Mi-Young;Lee Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.5
    • /
    • pp.470-480
    • /
    • 2006
  • The longer the input sentences, the worse the parsing results. To improve the parsing performance, many methods about long sentence segmentation have been reserarched. As an isolating language, Chinese sentence has fewer cues for sentence segmentation. However, the average frequency of comma usage in Chinese is higher than that of other languages. The syntactic information that the comma conveys can play an important role in long sentence segmentation of Chinese languages. This paper proposes a method for classifying commas in Chinese sentences according to the context where the comma occurs. Then, sentences are segmented using the classification result. The experimental results show that the accuracy of the comma classification reaches 87.1%, and with our segmentation model, the dependency parsing accuracy of our parser is improved by 5.6%.

Syntactic Analysis of Korean Sentence for Machine Translation (한국어의 Machine translation을 위한 구문 구조 분석)

  • Lee, Ju-Geun;Han, Seong-Guk;Jeon, Byeong-Dae
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.18 no.5
    • /
    • pp.15-21
    • /
    • 1981
  • This paper deals with the syntactic analysis algorithms of Korean sentence and system for machine translation. The parts of speech and constituients are syntactically analized at unified view-points and then an effective classification algorithm is proposed. The constituients which are applied an inverse movement transformation algorithm are processed with the concept of attribute. Syntactic analysis system is constructed to generate parsing table including the deep structure of sentence by lexicon proper to the combinational property of Korean and breadth-first searching method. The results obtained from the system program are shown as the parsing table of source sentences.

  • PDF

Korean Transition-based Dependency Parsing with Recurrent Neural Network (순환 신경망을 이용한 전이 기반 한국어 의존 구문 분석)

  • Li, Jianri;Lee, Jong-Hyeok
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.8
    • /
    • pp.567-571
    • /
    • 2015
  • Transition-based dependency parsing requires much time and efforts to design and select features from a very large number of possible combinations. Recent studies have successfully applied Multi-Layer Perceptrons (MLP) to find solutions to this problem and to reduce the data sparseness. However, most of these methods have adopted greedy search and can only consider a limited amount of information from the context window. In this study, we use a Recurrent Neural Network to handle long dependencies between sub dependency trees of current state and current transition action. The results indicate that our method provided a higher accuracy (UAS) than an MLP based model.

Korean Dependency Parsing Using Sequential Parsing Method Based on Pointer Network (순차적 구문 분석 방법을 반영한 포인터 네트워크 기반의 한국어 의존 구문 분석기)

  • Han, Janghoon;Park, Yeongjoon;Jeong, Younghoon;Lee, Inkwon;Han, Jungwook;Park, Seojun;Kim, Juae;Seo, Jeongyeon
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.533-536
    • /
    • 2019
  • 의존 구문 분석은 문장 구성 성분 간의 의존 관계를 분석하는 태스크로, 자연어 이해의 대표적인 과제 중 하나이다. 본 논문에서는 한국어 의존 구문 분석의 성능 향상을 위해 Deep Bi-Affine Network와 Left to Right Dependency Parser를 적용하고, 새롭게 한국어의 언어적 특징을 반영한 Right to Left Dependency Parser 모델을 제안한다. 3개의 의존 구문 분석 모델에 단어 표현을 생성하는 방법으로 ELMo, BERT 임베딩 방법을 적용하고 여러 종류의 모델을 앙상블하여 세종 의존 구문 분석 데이터에 대해 UAS 94.50, LAS 92.46 성능을 얻을 수 있었다.

  • PDF

A Study of Parsing System Implementation Using Segmentation and Argument Information (구간 분할과 논항정보를 이용한 구문분석시스템 구현에 관한 연구)

  • Park, Yong Uk;Kwon, Hyuk Chul
    • Journal of Korea Multimedia Society
    • /
    • v.16 no.3
    • /
    • pp.366-374
    • /
    • 2013
  • One of the most important problems in syntactic analysis is syntactic ambiguities. This paper proposes a parsing system and this system can reduce syntactic ambiguities by using segmentation method and argument information method. The proposed system uses morphemes for the input of syntax analysis system, and syntactic analysis system generates all possible parse trees from the given morphemes. Therefore, this system generates many syntactic ambiguity problems. We use three methods to solve these problems. First is disambiguation method in morphological analysis, second is segmentation method in syntactic analysis processing, and the last method is using argument information. Using these three methods, we can reduce many ambiguities in Korean syntactic analysis. In our experiment, our approach decreases about 53% of syntactic ambiguities.

Analysis of Korean Language Parsing System and Speed Improvement of Machine Learning using Feature Module (한국어 의존 관계 분석과 자질 집합 분할을 이용한 기계학습의 성능 개선)

  • Kim, Seong-Jin;Ock, Cheol-Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.8
    • /
    • pp.66-74
    • /
    • 2014
  • Recently a variety of study of Korean parsing system is carried out by many software engineers and linguists. The parsing system mainly uses the method of machine learning or symbol processing paradigm. But the parsing system using machine learning has long training time because the data of Korean sentence is very big. And the system shows the limited recognition rate because the data has self error. In this thesis we design system using feature module which can reduce training time and analyze the recognized rate each the number of training sentences and repetition times. The designed system uses the separated modules and sorted table for binary search. We use the refined 36,090 sentences which is extracted by Sejong Corpus. The training time is decreased about three hours and the comparison of recognized rate is the highest as 84.54% when 10,000 sentences is trained 50 times. When all training sentence(32,481) is trained 10 times, the recognition rate is 82.99%. As a result it is more efficient that the system is used the refined data and is repeated the training until it became the steady state.

A Two-Phase Shallow Semantic Parsing System Using Clause Boundary Information and Tree Distance (절 경계와 트리 거리를 사용한 2단계 부분 의미 분석 시스템)

  • Park, Kyung-Mi;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.531-540
    • /
    • 2010
  • In this paper, we present a two-phase shallow semantic parsing method based on a maximum entropy model. The first phase is to recognize semantic arguments, i.e., argument identification. The second phase is to assign appropriate semantic roles to the recognized arguments, i.e., argument classification. Here, the performance of the first phase is crucial for the success of the entire system, because the second phase is performed on the regions recognized at the identification stage. In order to improve performances of the argument identification, we incorporate syntactic knowledge into its pre-processing step. More precisely, boundaries of the immediate clause and the upper clauses of a predicate obtained from clause identification are utilized for reducing the search space. Further, the distance on parse trees from the parent node of a predicate to the parent node of a parse constituent is exploited. Experimental results show that incorporation of syntactic knowledge and the separation of argument identification from the entire procedure enhance performances of the shallow semantic parsing system.

Continuous Digit Recognition Using the Weight Initialization and LR Parser

  • Choi, Ki-Hoon;Lee, Seong-Kwon;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.15 no.2E
    • /
    • pp.14-23
    • /
    • 1996
  • This paper is a on the neural network to recognize the phonemes, the weight initialization to reduce learning speed, and LR parser for continuous speech recognition. The neural network spots the phonemes in continuous speech and LR parser parses the output of neural network. The whole phonemes recognized in neural network are divided into several groups which are grouped by the similarity of phonemes, and then each group consists of neural network. Each group of neural network to recognize the phonemes consisits of that recognize the phonemes of their own group and VGNN(Verify Group Neural Network) which judges whether the inputs are their own group or not. The weights of neural network are not initialized with random values but initialized from learning data to reduce learning speed. The LR parsing method applied to this paper is not a method which traces a unique path, but one which traces several possible paths because the output of neural network is not accurate. The parser processes the continuous speech frame by frame as accumulating the output of neural network through several possible paths. If this accumulated path-value drops below the threshold value, this path is deleted in possible parsing paths. This paper applies the continuous speech recognition system to the threshold value, this path is deleted in possible parsing paths. This paper applies the continuous speech recognition system to the continuous Korea digits recognition. The recognition rate of isolated digits is 97% in speaker dependent, and 75% in speaker dependent. The recognition rate of continuous digits is 74% in spaker dependent.

  • PDF