• Title/Summary/Keyword: Korean Parsing

Search Result 325, Processing Time 0.026 seconds

Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing (효율적인 한국어 파싱을 위한 최장일치 기반의 형태소 분석기 기능 확장)

  • Lee, Hyeon-yoeng;Lee, Jong-seok;Kang, Byeong-do;Yang, Seung-weon
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.203-210
    • /
    • 2016
  • Korean is free of omission of sentence elements and modifying scope, so managing it on morphological analyzer is better than parser. In this paper, we propose functional expansion methods of the morphological analyzer to ease the burden of parsing. This method is a longest phrase matching method. When the series of several morpheme have one syntax category by processing of Unknown-words, Compound verbs, Compound nouns, Numbers and Symbols, our method combines them into a syntactic unit. And then, it is to treat by giving them a semantic features as syntax unit. The proposed morphological analysis method removes unnecessary morphological ambiguities and deceases results of morphological analysis, so improves accuracy of tagger and parser. By empirical results, we found that our method deceases 73.4% of Parsing tree and 52.4% of parsing time on average.

Korean Probabilistic Syntactic Model using Head Co-occurrence (중심어 간의 공기정보를 이용한 한국어 확률 구문분석 모델)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.6
    • /
    • pp.809-816
    • /
    • 2002
  • Since a natural language has inherently structural ambiguities, one of the difficulties of parsing is resolving the structural ambiguities. Recently, a probabilistic approach to tackle this disambiguation problem has received considerable attention because it has some attractions such as automatic learning, wide-coverage, and robustness. In this paper, we focus on Korean probabilistic parsing model using head co-occurrence. We are apt to meet the data sparseness problem when we're using head co-occurrence because it is lexical. Therefore, how to handle this problem is more important than others. To lighten the problem, we have used the restricted and simplified phrase-structure grammar and back-off model as smoothing. The proposed model has showed that the accuracy is about 84%.

Korean Transition-based Dependency Parsing with Recurrent Neural Network (순환 신경망을 이용한 전이 기반 한국어 의존 구문 분석)

  • Li, Jianri;Lee, Jong-Hyeok
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.8
    • /
    • pp.567-571
    • /
    • 2015
  • Transition-based dependency parsing requires much time and efforts to design and select features from a very large number of possible combinations. Recent studies have successfully applied Multi-Layer Perceptrons (MLP) to find solutions to this problem and to reduce the data sparseness. However, most of these methods have adopted greedy search and can only consider a limited amount of information from the context window. In this study, we use a Recurrent Neural Network to handle long dependencies between sub dependency trees of current state and current transition action. The results indicate that our method provided a higher accuracy (UAS) than an MLP based model.

Static Analysis of Web Accessibility Based on Abstract Parsing (요약파싱기법을 사용한 웹 접근성의 정적 분석)

  • Kim, Hyunha;Doh, Kyung-Goo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1099-1109
    • /
    • 2014
  • Web-accessibility evaluation tools can be used to determine whether or not a website meets accessibility guidelines. As such, many such tools have been developed for web accessibility, but most of them dynamically fetch and analyze pages and as a result, some pages maybe omitted due to the lack of access authorization or environment information. In this paper, we propose a static method that analyzes web accessibility via abstract parsing. Our abstract parsing technique understands syntactic and semantic program structures that dynamically generate web pages according to external inputs and parameters. The static method performs its analysis without omitting any pages because it covers all execution paths. We performed an experiment with a PHP-based website to demonstrate how our tool discovers more accessibility errors than a dynamic page accessibility analysis tool.

A Two-Phase Shallow Semantic Parsing System Using Clause Boundary Information and Tree Distance (절 경계와 트리 거리를 사용한 2단계 부분 의미 분석 시스템)

  • Park, Kyung-Mi;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.531-540
    • /
    • 2010
  • In this paper, we present a two-phase shallow semantic parsing method based on a maximum entropy model. The first phase is to recognize semantic arguments, i.e., argument identification. The second phase is to assign appropriate semantic roles to the recognized arguments, i.e., argument classification. Here, the performance of the first phase is crucial for the success of the entire system, because the second phase is performed on the regions recognized at the identification stage. In order to improve performances of the argument identification, we incorporate syntactic knowledge into its pre-processing step. More precisely, boundaries of the immediate clause and the upper clauses of a predicate obtained from clause identification are utilized for reducing the search space. Further, the distance on parse trees from the parent node of a predicate to the parent node of a parse constituent is exploited. Experimental results show that incorporation of syntactic knowledge and the separation of argument identification from the entire procedure enhance performances of the shallow semantic parsing system.

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

Processing of syntactic dependency in Korean relative clauses: Evidence from an eye-tracking study (안구이동추적을 통해 살펴본 관계절의 통사처리 과정)

  • Lee, Mi-Seon;Yong, Nam-Seok
    • Korean Journal of Cognitive Science
    • /
    • v.20 no.4
    • /
    • pp.507-533
    • /
    • 2009
  • This paper examines the time course and processing patterns of filler-gap dependencies in Korean relative clauses, using an eyetracking method. Participants listened to a short story while viewing four pictures of entities mentioned in the story. Each story is followed by an auditorily presented question involving a relative clause (subject relative or dative relative). Participants' eye movements in response to the question were recorded. Results showed that the proportion of looks to the picture corresponding to a filler noun significantly increased at the relative verb affixed with a relativizer, and was largest at the filler where the fixation duration on the filler picture significantly increased. These results suggest that online resolution of the filler-gap dependency only starts at the relative verb marked with a relativiser and is finally completed at the filler position. Accordingly, they partly support the filler-driven parsing strategy for Korean, as for head-initial languages. In addition, the different patterns of eye movements between subject relatives and dative relatives indicate the role of case markers in parsing Korean sentences.

  • PDF

Segmentation of Long Chinese Sentences using Comma Classification (쉼표의 자동분류에 따른 중국에 장문분할)

  • Jin Me-Ixun;Kim Mi-Young;Lee Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.5
    • /
    • pp.470-480
    • /
    • 2006
  • The longer the input sentences, the worse the parsing results. To improve the parsing performance, many methods about long sentence segmentation have been reserarched. As an isolating language, Chinese sentence has fewer cues for sentence segmentation. However, the average frequency of comma usage in Chinese is higher than that of other languages. The syntactic information that the comma conveys can play an important role in long sentence segmentation of Chinese languages. This paper proposes a method for classifying commas in Chinese sentences according to the context where the comma occurs. Then, sentences are segmented using the classification result. The experimental results show that the accuracy of the comma classification reaches 87.1%, and with our segmentation model, the dependency parsing accuracy of our parser is improved by 5.6%.

Empirical Comparison of Deep Learning Networks on Backbone Method of Human Pose Estimation

  • Rim, Beanbonyka;Kim, Junseob;Choi, Yoo-Joo;Hong, Min
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.21-29
    • /
    • 2020
  • Accurate estimation of human pose relies on backbone method in which its role is to extract feature map. Up to dated, the method of backbone feature extraction is conducted by the plain convolutional neural networks named by CNN and the residual neural networks named by Resnet, both of which have various architectures and performances. The CNN family network such as VGG which is well-known as a multiple stacked hidden layers architecture of deep learning methods, is base and simple while Resnet which is a bottleneck layers architecture yields fewer parameters and outperform. They have achieved inspired results as a backbone network in human pose estimation. However, they were used then followed by different pose estimation networks named by pose parsing module. Therefore, in this paper, we present a comparison between the plain CNN family network (VGG) and bottleneck network (Resnet) as a backbone method in the same pose parsing module. We investigate their performances such as number of parameters, loss score, precision and recall. We experiment them in the bottom-up method of human pose estimation system by adapted the pose parsing module of openpose. Our experimental results show that the backbone method using VGG network outperforms the Resent network with fewer parameter, lower loss score and higher accuracy of precision and recall.

UEPF:A blockchain based Uniform Encoding and Parsing Framework in multi-cloud environments

  • Tao, Dehao;Yang, Zhen;Qin, Xuanmei;Li, Qi;Huang, Yongfeng;Luo, Yubo
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.15 no.8
    • /
    • pp.2849-2864
    • /
    • 2021
  • The emerging of cloud data sharing can create great values, especially in multi-cloud environments. However, "data island" between different cloud service providers (CSPs) has drawn trust problem in data sharing, causing contradictions with the increasing sharing need of cloud data users. And how to ensure the data value for both data owner and data user before sharing, is another challenge limiting massive data sharing in the multi-cloud environments. To solve the problems above, we propose a Uniform Encoding and Parsing Framework (UEPF) with blockchain to support trustworthy and valuable data sharing. We design namespace-based unique identifier pair to support data description corresponding with data in multi-cloud, and build a blockchain-based data encoding protocol to manage the metadata with identifier pair in the blockchain ledger. To share data in multi-cloud, we build a data parsing protocol with smart contract to query and get the sharing cloud data efficiently. We also build identifier updating protocol to satisfy the dynamicity of data, and data check protocol to ensure the validity of data. Theoretical analysis and experiment results show that UEPF is pretty efficient.