Browse > Article
http://dx.doi.org/10.5391/JKIIS.2008.18.2.243

An Automatic Summarization of Call-For-Paper Documents Using a 2-Phase hidden Markov Model  

Kim, Jeong-Hyun (삼성전자 무선사업부)
Park, Seong-Bae (경북대학교 컴퓨터공학과)
Lee, Sang-Jo (경북대학교 컴퓨터공학과)
Park, Se-Young (경북대학교 컴퓨터공학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.18, no.2, 2008 , pp. 243-250 More about this Journal
Abstract
This paper proposes a system which extracts necessary information from call-for-paper (CFP) documents using a hidden Markov model (HMM). Even though a CFP does not follow a strict form, there is, in general, a relatively-fixed sequence of information within most CFPs. Therefore, a hiden Markov model is adopted to analyze CFPs which has an advantage of processing consecutive data. However, when CFPs are intuitively modeled with a hidden Markov model, a problem arises that the boundaries of the information are not recognized accurately. In order to solve this problem, this paper proposes a two-phrase hidden Markov model. In the first step, the P-HMM (Phrase hidden Markov model) which models a document with phrases recognizes CFP documents locally. Then, the D-HMM (Document hidden Markov model) grasps the overall structure and information flow of the document. The experiments over 400 CFP documents grathered on Web result in 0.49 of F-score. This performance implies 0.15 of F-measure improvement over the HMM which is intuitively modeled.
Keywords
정보 추출;논문 모집 공고;은닉 마코프 모델;2단계 학습;
Citations & Related Records
연도 인용수 순위
  • Reference
1 P. Maes, "Agents that Reduce Work and Information Overloading," Communications of the ACM, Vol. 37, No. 7. pp. 31-40, 1994
2 C. Manning and H. Schuetze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999
3 C. Giuliano, A. Gliozzo, A. Lavelli, and L. Romano, "Filtering Uninformative Words to Speed up IE: ITC-irst Participation in the PASCAL Challenge," PASCAL Challenge, 2005
4 N. Chinchor, "Overview of MUC-7/MET-2," In Proceedings of the 7th Message Understanding Conference, 1998
5 K. Seymore, A. McCallum, and R. Rosenfeld, "Learning Hidden Markov Model Structure for Information Extraction," In Proceedings of AAAI '99 Workshop on Machine Learning for Information Extraction, pp. 37-42, 1999
6 E. Riloff, "Information Extraction as a Stepping Stone Toward Story Understanding," Understanding Language Understanding: Computational Models of Reading, The MIT Press, 1999
7 D. Miller, T. Leek, and R. Schwartz, "A Hidden Markov Model Information Retrieval System," In Proceedings of the 22nd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 214-221, 1999
8 A. Viterbi, "Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm," IEEE Transactions on Information Theory, Vol. 13, No. 2, pp. 260-267, 1967   DOI   ScienceOn
9 A. Stolcke, Bayesian Learning of Probabilistic Language Models, Ph.D Thesis, University of California, Berkeley, 1994
10 N. Ireson, F. Ciravegna, M. Claiff, D. Freitag, N. Kushmerick, and A. Lavelli, "Evaluating Machine Learning for Information Extraction," In Proceedings of the 22nd International Conference on Machine Learning, pp. 345-352, 2005