• Title/Summary/Keyword: parsing

Search Result 507, Processing Time 0.037 seconds

Strategy to parsing Table about Event Information for integrated digital broadcasting environment (통합 디지털 방송 환경을 위한 Program Event에 대한 정보를 포함하는 Table의 통합 parsing 기법)

  • Shin, Yun-ho;Lee, Hyuk-Joon;Kim, Jung-sun
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2010.11a
    • /
    • pp.855-858
    • /
    • 2010
  • 여러 표준이 존재하는 디지털 방송 환경에서 각 표준별로 Electronic Program Guide(EPG)를 얻기 위한 Program Event에 대한 정보를 포함하는 Table 파싱 기법이 다르다. 이를 해결하고 통합 디지털 방송 환경을 만들기 위해서 XML 파일을 이용한 테이블 통합 파싱 기법을 제안한다.

Analysis of Korean Language Parsing System and Speed Improvement of Machine Learning using Feature Module (한국어 의존 관계 분석과 자질 집합 분할을 이용한 기계학습의 성능 개선)

  • Kim, Seong-Jin;Ock, Cheol-Young
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.8
    • /
    • pp.66-74
    • /
    • 2014
  • Recently a variety of study of Korean parsing system is carried out by many software engineers and linguists. The parsing system mainly uses the method of machine learning or symbol processing paradigm. But the parsing system using machine learning has long training time because the data of Korean sentence is very big. And the system shows the limited recognition rate because the data has self error. In this thesis we design system using feature module which can reduce training time and analyze the recognized rate each the number of training sentences and repetition times. The designed system uses the separated modules and sorted table for binary search. We use the refined 36,090 sentences which is extracted by Sejong Corpus. The training time is decreased about three hours and the comparison of recognized rate is the highest as 84.54% when 10,000 sentences is trained 50 times. When all training sentence(32,481) is trained 10 times, the recognition rate is 82.99%. As a result it is more efficient that the system is used the refined data and is repeated the training until it became the steady state.

Static Analysis of Web Accessibility Based on Abstract Parsing (요약파싱기법을 사용한 웹 접근성의 정적 분석)

  • Kim, Hyunha;Doh, Kyung-Goo
    • Journal of KIISE
    • /
    • v.41 no.12
    • /
    • pp.1099-1109
    • /
    • 2014
  • Web-accessibility evaluation tools can be used to determine whether or not a website meets accessibility guidelines. As such, many such tools have been developed for web accessibility, but most of them dynamically fetch and analyze pages and as a result, some pages maybe omitted due to the lack of access authorization or environment information. In this paper, we propose a static method that analyzes web accessibility via abstract parsing. Our abstract parsing technique understands syntactic and semantic program structures that dynamically generate web pages according to external inputs and parameters. The static method performs its analysis without omitting any pages because it covers all execution paths. We performed an experiment with a PHP-based website to demonstrate how our tool discovers more accessibility errors than a dynamic page accessibility analysis tool.

A Two-Phase Shallow Semantic Parsing System Using Clause Boundary Information and Tree Distance (절 경계와 트리 거리를 사용한 2단계 부분 의미 분석 시스템)

  • Park, Kyung-Mi;Hwang, Kyu-Baek
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.5
    • /
    • pp.531-540
    • /
    • 2010
  • In this paper, we present a two-phase shallow semantic parsing method based on a maximum entropy model. The first phase is to recognize semantic arguments, i.e., argument identification. The second phase is to assign appropriate semantic roles to the recognized arguments, i.e., argument classification. Here, the performance of the first phase is crucial for the success of the entire system, because the second phase is performed on the regions recognized at the identification stage. In order to improve performances of the argument identification, we incorporate syntactic knowledge into its pre-processing step. More precisely, boundaries of the immediate clause and the upper clauses of a predicate obtained from clause identification are utilized for reducing the search space. Further, the distance on parse trees from the parent node of a predicate to the parent node of a parse constituent is exploited. Experimental results show that incorporation of syntactic knowledge and the separation of argument identification from the entire procedure enhance performances of the shallow semantic parsing system.

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

Functional Expansion of Morphological Analyzer Based on Longest Phrase Matching For Efficient Korean Parsing (효율적인 한국어 파싱을 위한 최장일치 기반의 형태소 분석기 기능 확장)

  • Lee, Hyeon-yoeng;Lee, Jong-seok;Kang, Byeong-do;Yang, Seung-weon
    • Journal of Digital Contents Society
    • /
    • v.17 no.3
    • /
    • pp.203-210
    • /
    • 2016
  • Korean is free of omission of sentence elements and modifying scope, so managing it on morphological analyzer is better than parser. In this paper, we propose functional expansion methods of the morphological analyzer to ease the burden of parsing. This method is a longest phrase matching method. When the series of several morpheme have one syntax category by processing of Unknown-words, Compound verbs, Compound nouns, Numbers and Symbols, our method combines them into a syntactic unit. And then, it is to treat by giving them a semantic features as syntax unit. The proposed morphological analysis method removes unnecessary morphological ambiguities and deceases results of morphological analysis, so improves accuracy of tagger and parser. By empirical results, we found that our method deceases 73.4% of Parsing tree and 52.4% of parsing time on average.

A Study on Transport Stream Analysis and Parsing Ability Enhancement in Digital Broadcasting and Service (디지털 방송 서비스에서 트랜스포트 스트림 분석 및 파싱 능력 향상에 관한 연구)

  • Kim, Jang-Won
    • The Journal of Korea Institute of Information, Electronics, and Communication Technology
    • /
    • v.10 no.6
    • /
    • pp.552-557
    • /
    • 2017
  • Wire, wireless digital broadcasting has sharply expanded with the birth of high definition TV since 2010, the use of duplex contents as well as simplex contents has rapidly increased. Currently, our satellite communications system adopted DVB by European digital broadcasting standardization organization as a standard of domestic data broadcasting, the method how to use selective contents has been studied variously according to the development of IPTV. Digital broadcasting utilizes the method using Transport Stream Packet(TSP) by the way of multiplexing of information in order to send multimedia information such as video, audio and data of MPEG-2, this streams include detail information on TV guide and program as well as video and audio information. In order to understand these data broadcasting system, this study realized TS analyzer that divides transport stream (TS) by packet in Linux environment, analyzes and prints by function, it can help the understanding of TS, the enhancement of stream parsing ability.

Korean Probabilistic Syntactic Model using Head Co-occurrence (중심어 간의 공기정보를 이용한 한국어 확률 구문분석 모델)

  • Lee, Kong-Joo;Kim, Jae-Hoon
    • The KIPS Transactions:PartB
    • /
    • v.9B no.6
    • /
    • pp.809-816
    • /
    • 2002
  • Since a natural language has inherently structural ambiguities, one of the difficulties of parsing is resolving the structural ambiguities. Recently, a probabilistic approach to tackle this disambiguation problem has received considerable attention because it has some attractions such as automatic learning, wide-coverage, and robustness. In this paper, we focus on Korean probabilistic parsing model using head co-occurrence. We are apt to meet the data sparseness problem when we're using head co-occurrence because it is lexical. Therefore, how to handle this problem is more important than others. To lighten the problem, we have used the restricted and simplified phrase-structure grammar and back-off model as smoothing. The proposed model has showed that the accuracy is about 84%.

Segmentation of Long Chinese Sentences using Comma Classification (쉼표의 자동분류에 따른 중국에 장문분할)

  • Jin Me-Ixun;Kim Mi-Young;Lee Jong-Hyeok
    • Journal of KIISE:Software and Applications
    • /
    • v.33 no.5
    • /
    • pp.470-480
    • /
    • 2006
  • The longer the input sentences, the worse the parsing results. To improve the parsing performance, many methods about long sentence segmentation have been reserarched. As an isolating language, Chinese sentence has fewer cues for sentence segmentation. However, the average frequency of comma usage in Chinese is higher than that of other languages. The syntactic information that the comma conveys can play an important role in long sentence segmentation of Chinese languages. This paper proposes a method for classifying commas in Chinese sentences according to the context where the comma occurs. Then, sentences are segmented using the classification result. The experimental results show that the accuracy of the comma classification reaches 87.1%, and with our segmentation model, the dependency parsing accuracy of our parser is improved by 5.6%.

Comprehension Processes and Stuctures of Korean Relative Clause Sentence (한국어 관계절 문장의 이해 과정과 구조)

  • 김영진
    • Korean Journal of Cognitive Science
    • /
    • v.6 no.2
    • /
    • pp.5-27
    • /
    • 1995
  • Based on the given data if three experiments that measured word-by-word reading times of the Korean relative-clause sentences,parsing strategies and performance structures in comprehending Korean sentences were suggested.First,results of the significantily longer reading time of nouns than verbs suggested that Korean parsing processing would be primarily occurred at nouns.Seond,four parsing strategies were proposed to explain increased reading times,working memory loads,and parallel function effects.Third,performance structures of sentence comprehension were constructed from the interword reading time differences.The proposed strategies and structures seem to account for the patterns of word-by-word reading times of the five types of the Korean relative-clause se various ideas for further experimentation were discussed.

  • PDF