Ontology-based Automated Metadata Generation Considering Semantic Ambiguity

의미 중의성을 고려한 온톨로지 기반 메타데이타의 자동 생성

  • Published : 2006.11.15

Abstract

There has been an increasing necessity of Semantic Web-based metadata that helps computers efficiently understand and manage an information increased with the growth of Internet. However, it seems inevitable to face some semantically ambiguous information when metadata is generated. Therefore, we need a solution to this problem. This paper proposes a new method for automated metadata generation with the help of a concept of class, in which some ambiguous words imbedded in information such as documents are semantically more related to others, by using probability model of consequent words. We considers ambiguities among defined concepts in ontology and uses the Hidden Markov Model to be aware of part of a named entity. First of all, we constrict a Markov Models a better understanding of the named entity of each class defined in ontology. Next, we generate the appropriate context from a text to understand the meaning of a semantically ambiguous word and solve the problem of ambiguities during generating metadata by searching the optimized the Markov Model corresponding to the sequence of words included in the context. We experiment with seven semantically ambiguous words that are extracted from computer science thesis. The experimental result demonstrates successful performance, the accuracy improved by about 18%, compared with SemTag, which has been known as an effective application for assigning a specific meaning to an ambiguous word based on its context.

인터넷의 발전으로 방대해진 정보를 컴퓨터가 이해하고 효율적으로 관리하기 위해서는 시맨틱 웹 기반의 메타데이타가 반드시 필요하다. 그러나 메타데이타 생성 시 의미 중의성을 가진 정보가 존재하며 이 문제의 해결책이 필요하다. 본 논문에서는 순차적으로 존재할 수 있는 단어들의 확률 모델을 이용하여 문서와 같은 정보에 포함된 의미가 애매한 단어를 관련성이 높은 모델의 개념으로 메타데이타를 생성하는 방법을 제안한다. 제안한 방법에서 메타데이타를 생성 할 때, 온톨로지에 정의된 개념들 간의 중의성을 고려하고 명칭(named entity)의 일부 단어에 대한 인식을 위해 은닉 마르코프 모델(Hidden Markov Model)을 사용한다. 먼저 온톨로지에 정의된 각 클래스(class)의 인스턴스(instance)를 인식하기 위한 마르코프 모델을 생성한다. 다음으로 문서로부터 의미가 애매한 단어의 의미를 파악할 수 있는 상황정보(Context)를 생성하고, 상황정보에 포함된 단어들의 순서에 대응하는 최적의 마르코프 모델을 찾아 메타데이타 생성시의 중의성 문제를 해결한다. 제안한 방법으로 전산학관련 논문에 대해 의미가 애매한 7개의 단어를 추출하여 실험하였다. 그 결과 상황정보에 존재하는 개체(entity)의 의미부류들 중 가장 빈번한 의미 부류로 애매한 단어의 의미를 선정한 SemTag보다 정확도 면에서 38%정도의 나은 성능을 나타내었다.

Keywords

References

  1. Euzenat, J., 'Eight questions about Semantic Web annotations,' IEEE Intelligent Systems, Vol. 17, No.2, pp.55-62, 2002 https://doi.org/10.1109/MIS.2002.999221
  2. Berners-Lee, T., Hendler, J. and Lassila, O., The Semantic Web, Scientific American, 2001
  3. Fensel, D., Hendler, J., Lieberman, H. and Wahlster, W., Spinning the Semantic Web, MIT Press, 2003
  4. Antoniou, G. and Van Harmelen, F., A Semantic Web Primer, MIT Press, 2004
  5. Popov, B., Kiryakov, A., Ognyanoff, D., Manov, D. and Kirilov, A., 'KIM - a semantic platform for information extaction and retrieval,' Journal of Natural Language Engineering, Vol. 10, Issue 3-4, pp. 375-392, 2004 https://doi.org/10.1017/S135132490400347X
  6. Dill, S., Eiron, N., Gibson, D., Gruhl, D., Guha, R., et al., 'Semtag and Seeker: Bootstrapping the semantic web via automated semantic annotation,' WWW 2003, 2003 https://doi.org/10.1145/775152.775178
  7. Guha, R. and McCool, R., Tap: Towards a Web of Data. http://tap.stanford.edu/
  8. Cunningham. H.. Maynard. D.. Bontcheva, K. and Tablan, V., 'GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications,' Proceedings of the 40th Anniversary Meeting of the Association for Computational Linguistics (ACL'02), 2002
  9. Bontcheva, K., Maynard, D., Cunningham, H. and Saggion, H., 'Using Human Language Technology for Automatic Annotation and Indexing of Digital Library Content,' ECDL'2002, 2002
  10. Miller, D., Leek, T. and Schwartz, R., 'A Hidden Markov Model Information Retrieval System,' Proceedings on the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, pp. 214-221, 1999 https://doi.org/10.1145/312624.312680
  11. Gruber, T., 'Toward Principles for the Design of Ontologies Used for Knowledge Sharing,' Standford Knowledge Systems Laboratory, 1993
  12. Smith, M., Welty, C., Deborah, L. and McGuinness, D., 'OWL Web Ontology Language Guide,' W3C Recommendation, 10 February 2004. http://www.w3.org/TR/2004/REC-owl-guide-20040210/
  13. Fikes, R., Jenkins, J. and Zhou, Q., 'Including Domain-Specific Reasoners with Reusable Ontologies,' Proceedings of the 2003 International Conference on Information and Knowledge Engineering, 2003
  14. Manola, F. and Miller, E., 'RDF Primer,' W3C Working Draft 23 January 2003
  15. Seaborne, A, 'Jena Tutorial : A Programmer's Introduction to RDQL,' April 2002
  16. Ranganathan, A and Campbell, R., 'A Middle-ware for Context-Aware Agents in Ubiquitous Computing Environments,' In ACM/IFIP/USENIX International Middleware Conference 2004, 2004
  17. terms, http://www.terms.co.kr/
  18. AI Study, http://www.aistudy.co.kr/