• Title/Summary/Keyword: Context-free grammar

Search Result 35, Processing Time 0.021 seconds

Bracketing Input for Accurate Parsing

  • No, Yong-Kyoon
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.358-364
    • /
    • 2007
  • Syntax parsers can benefit from speakers' intuition about constituent structures indicated in the input string in the form of parentheses. Focusing on languages like Korean, whose orthographic convention requires more than one word to be written without spaces, we describe an algorithm for passing the bracketing information across the tagger to the probabilistic CFG parser, together with one for heightening (or penalizing, as the case may be) probabilities of putative constituents as they are suggested by the parser. It is shown that two or three constituents marked in the input suffice to guide the parser to the correct parse as the most likely one, even with sentences that are considered long.

  • PDF

Why Korean Is Not a Regular Language: A Proof

  • No, Yong-Kyoon
    • Language and Information
    • /
    • v.5 no.2
    • /
    • pp.1-8
    • /
    • 2001
  • Natural language string sets are known to require a grammar with a generative capacity slightly beyond that of Context Free Grammars. Proofs regarding complexity of natural language have involved particular properties of languages like English, Swiss German and Bambara. While it is not very difficult to prove that Korean is more complex than the simplest of the many infinite sets, no proof has been given of this in the literature. I identify two types of center embedding in Korean and use them in proving that Korean is not a regular set, i.e. that no FSA's can recognize its string set. The regular language i salam i (i salam ul$)^j$ michi (key ha)^k$ essta is intersected with Korean, to give {i salam i (i salam ul$)^j$ michi (key ha$)^k$ essta i $$\mid$$ j, k $\geq$ 0 and j $\leq$ k}. This latter language is proved to be nonregular. As the class of regular sets is closed under intersection, Korean cannot be regular.

  • PDF

Syntactic and Semantic Disambiguation for Interpretation of Numerals in the Information Retrieval (정보 검색을 위한 숫자의 해석에 관한 구문적.의미적 판별 기법)

  • Moon, Yoo-Jin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.14 no.8
    • /
    • pp.65-71
    • /
    • 2009
  • Natural language processing is necessary in order to efficiently perform filtering tremendous information produced in information retrieval of world wide web. This paper suggested an algorithm for meaning of numerals in the text. The algorithm for meaning of numerals utilized context-free grammars with the chart parsing technique, interpreted affixes connected with the numerals and was designed to disambiguate their meanings systematically supported by the n-gram based words. And the algorithm was designed to use POS (part-of-speech) taggers, to automatically recognize restriction conditions of trigram words, and to gradually disambiguate the meaning of the numerals. This research performed experiment for the suggested system of the numeral interpretation. The result showed that the frequency-proportional method recognized the numerals with 86.3% accuracy and the condition-proportional method with 82.8% accuracy.

A Study on Pseudo N-gram Language Models for Speech Recognition (음성인식을 위한 의사(疑似) N-gram 언어모델에 관한 연구)

  • 오세진;황철준;김범국;정호열;정현열
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.2 no.3
    • /
    • pp.16-23
    • /
    • 2001
  • In this paper, we propose the pseudo n-gram language models for speech recognition with middle size vocabulary compared to large vocabulary speech recognition using the statistical n-gram language models. The proposed method is that it is very simple method, which has the standard structure of ARPA and set the word probability arbitrary. The first, the 1-gram sets the word occurrence probability 1 (log likelihood is 0.0). The second, the 2-gram also sets the word occurrence probability 1, which can only connect the word start symbol and WORD, WORD and the word end symbol . Finally, the 3-gram also sets the ward occurrence probability 1, which can only connect the word start symbol , WORD and the word end symbol . To verify the effectiveness of the proposed method, the word recognition experiments are carried out. The preliminary experimental results (off-line) show that the word accuracy has average 97.7% for 452 words uttered by 3 male speakers. The on-line word recognition results show that the word accuracy has average 92.5% for 20 words uttered by 20 male speakers about stock name of 1,500 words. Through experiments, we have verified the effectiveness of the pseudo n-gram language modes for speech recognition.

  • PDF

CHART PARSER FOR ILL-FORMED INPUT SENTENCES (잘못 형성된 입력문장에 대한 CHART PARSER)

  • KyonghoMin
    • Korean Journal of Cognitive Science
    • /
    • v.4 no.1
    • /
    • pp.177-212
    • /
    • 1993
  • My research is based on the parser for ill-formed input by Mellish in a paper in ACL 27th meeting Proceedings. 1989. My system is composed of two parsers:WFCP and IFCP. When WFCP fails to give the parse tree for the input sentence, the sentence is identified as ill-formed and is parsed by IFCP for error detection and recovery at the syntactic level. My system is indendent of grammatical rules. It does not take into account semantic ill-formedness. My system uses a grammar composed of 25 context-free rules. My system consistes of two major parsing strategies:top-down expection and bottem-up satisfaction. With top-down expectation. rules are retrieved under the inference condition and expaned by inactive arcs. When doing bottom-up parsing. my parser used two modes:Left-Right parsing and Right-to-Left parsing. My system repairs errors sucessfully when the input contains an omitted word or an unknown word substitued for a valid word. Left- corner and right-corner errors are more easily detected and repaired than ill-formed senteces where the error is in teh middle. The deviance note. with repair details, is kept in new inactive arcs which are generated by the error correction procedure. The implementation of my system is quite different from Mellish's. When rules are invoked. my system invokes all rules with minimal inference. My bottom up parsing strategy uses Left-to-Right mode and Right-to-Left mode. My system is bottom-up-parsing-oriented like the chart parser. Errors are repaired in two ways:using top-down hypothesis, and using Need-Chart which keeps the information of expectation and complection of expanded goals by rules. To reduce the number of top-down cycles. all rules are invoked simultaneously and this invocation information is kept in Need-Chart. This idea will be extended for the implementation of multiple error recovery system.