Proceedings of the Korean Society for Language and Information Conference (한국언어정보학회:학술대회논문집)
- 2007.11a
- /
- Pages.358-364
- /
- 2007
Bracketing Input for Accurate Parsing
Abstract
Syntax parsers can benefit from speakers' intuition about constituent structures indicated in the input string in the form of parentheses. Focusing on languages like Korean, whose orthographic convention requires more than one word to be written without spaces, we describe an algorithm for passing the bracketing information across the tagger to the probabilistic CFG parser, together with one for heightening (or penalizing, as the case may be) probabilities of putative constituents as they are suggested by the parser. It is shown that two or three constituents marked in the input suffice to guide the parser to the correct parse as the most likely one, even with sentences that are considered long.
Keywords
- manually parsed corpus;
- Probabilistic Context Free Grammar;
- Korean syntax;
- bottom-up chart parser;
- pre-annotated input;
- Paak;
- KWGInterpreter