DOI QR코드

DOI QR Code

정보 검색을 위한 숫자의 해석에 관한 구문적.의미적 판별 기법

Syntactic and Semantic Disambiguation for Interpretation of Numerals in the Information Retrieval

  • 문유진 (한국외국어대학교 경영정보학과)
  • 발행 : 2009.08.31

초록

월드 와이드 웹의 정보 검색에서 산출되어지는 수많은 정보를 효율적으로 검색하기 위해서 자연어 정보처리가 필수적이다. 이 논문은 텍스트에서 숫자의 의미 파악을 위한 판별기법을 제안한 것이다. 숫자 의미 판별기법은 챠트 파싱 기법과 함께 문맥자유 문법을 활용하여 숫자 스트링과 연관된 접사를 해석하였으며, N-그램 기반의 단어에 의거하여 조직화된 의미 파악을 하도록 설계되었다. 그리고 POS 태거를 사용하여 트라이그램 단어의 제한조건이 자동 인식되도록 시스템을 구성하여, 점진적으로 효율적인 숫자의 의미 파악을 하도록 하였다. 이 논문에서 제안한 숫자 해석 시스템을 실험한 결과, 빈도수 비례 방법은 86.3%의 정확률을 나타냈고 조건수 비례 방법은 82.8%의 정확률을 나타냈다.

Natural language processing is necessary in order to efficiently perform filtering tremendous information produced in information retrieval of world wide web. This paper suggested an algorithm for meaning of numerals in the text. The algorithm for meaning of numerals utilized context-free grammars with the chart parsing technique, interpreted affixes connected with the numerals and was designed to disambiguate their meanings systematically supported by the n-gram based words. And the algorithm was designed to use POS (part-of-speech) taggers, to automatically recognize restriction conditions of trigram words, and to gradually disambiguate the meaning of the numerals. This research performed experiment for the suggested system of the numeral interpretation. The result showed that the frequency-proportional method recognized the numerals with 86.3% accuracy and the condition-proportional method with 82.8% accuracy.

키워드

참고문헌

  1. Fensel, D., Hendler, J., Lieberman. H. & Wahlstet, W., "Spinning the Semantic Web," MIT Press, 2003.
  2. Jhingran, A. D. & Pirahesh, M. N., "Information Integration: A Research Agenda," IBM System Journal, Vol.41, No.4, pp.555-562, 2002. https://doi.org/10.1147/sj.414.0555
  3. Nelson, G., Wallis, S. & Arts, B., "Exploring Natural Language - Working with the British Component of the International Corpus of English," John Benjamins, The Netherlands, 2002.
  4. Maynard, D., Tablan, V., Ursu, C., Cunningham, H. & Wilks, Y., "Named Entity Recognition from Diverse Text Types," Proceedings of Recent Advances in NLP, 2001.
  5. Wang, H. and Yu. S., "The Semantic Knowledge Base of Contemporary Chinese and its Application in WSD," Proceedings of the Second SIGHAN Workshop on Chinese Language Processing, pp.112-118, 2003.
  6. Asahara M. & Matsumoto, Y., "Japanese Named Entity Extraction with Redundant Morphological Analysis," Proceedings of HLT-NAACL 2003, pp.8-15, 2003.
  7. Siegel, M. & Bender, E. M., "Efficient Deep Processing of Japanese," Proceedings of the 3rd Workshop on Asian Language Resources and International Standardization, 2002.
  8. Polanyi, L. & van den Berg, M., "Logical Structure and Discourse Anaphora Resolution," Proceedings of ACL99 Workshop on The Relation of Discourse/Dialogue Structure and Reference, pp.10-117, 1999.
  9. Zhou, G. & Su, J., "Named Entity Recognition using an HMM-based Chunk Tagger," Proceedings of ACL 2002, pp.473-480, 2002.
  10. Black, W., Rinaldi, F. & Mowatt, D., "FACILE: Description of the NE System used for MUC-7," Proceedings of MUC-7, 1998.
  11. Chieu, L. & Ng, T., "Named Entity Recognition: A Maximum Entropy Approach Using Global Information," Proceedings of the 19th COLING, pp.190-196, 2002.
  12. CoNLL-2003 Language-Independent Named Entity Recognition, http://www.cnts.uia.ac.be/conll2003/ner/2, 2003.
  13. Reiter, E. & Sripada, S., "Learning the Meaning and Usage of Time Phrases from a Parallel Text-Data Corpus," Proceedings of HLT-NAACL 2003 Workshop on Learning Word Meaning from Non-Linguistic Data, pp.78-85, 2003.
  14. Piprani. B., "Towards a Common Platform to Support Business Processes, Services and Semantics," 12th Annual Open Forum for Metadata Registries, 2009.
  15. Davis. J., "Semantic Frameworks: Meanings in the Architecture," 12th Annual Open Forum for Metadata Registries, 2009.
  16. Dale. R., "A Framework for Complex Tokenization and its Application to Newspaper Text," Proceedings of Australian document Computing Symposium, 1997.
  17. 이경호, 양룡, 이상범, "색상 정보를 이용한 자동 독화 특징 추출," 한국컴퓨터정보학회 논문지, 제13권, 6호, 107-116쪽, 2008년 11월.
  18. 김선옥, 이경호, "얼굴 특징점을 이용한 한국어 8모음 독화시스템 구축," 한국컴퓨터정보학회 논문지, 제16권, 2호, 135-140쪽, 2008년 12월.