Browse > Article
http://dx.doi.org/10.3745/KIPSTB.2005.12B.1.081

Multilingual Story Link Detection based on Properties of Event Terms  

Lee Kyung-Soon (전북대학교 전자정보공학부)
Abstract
In this paper, we propose a novel approach which models multilingual story link detection by adapting the features such as timelines and multilingual spaces as weighting components to give distinctive weights to terms related to events. On timelines term significance is calculated by comparing term distribution of the documents on that day with that on the total document collection reported, and used to represent the document vectors on that day. Since two languages can provide more information than one language, term significance is measured on each language space and used to refer the other language space as a bridge on multilingual spaces. Evaluating the method on Korean and Japanese news articles, our method achieved $14.3{\%}\;and\;16.7{\%}$ improvement for mono- and multi-lingual story pairs, and for multilingual story pairs, respectively. By measuring the space density, the proposed weighting components are verified with a high density of the intra-event stories and a low density of the inter-events stories. This result indicates that the proposed method is helpful for multilingual story link detection.
Keywords
topic link detection; space density; event term; topic detection and tracking; term distribution;
Citations & Related Records
연도 인용수 순위
  • Reference
1 ChangshinSoft. 2001. ezTrans Korean-to-Japanese/Japanese-to-Korean machine translation system
2 Masui, F., Suzuki, N. and Hukumoto, J. 2002. Named entity extraction(NExT) for text processing development. Proc. of 8th time annual meeting of The Association for Natural Language Processing(In Japanese). http : //www.ai.info.mie-u.ac.jp/next/
3 Salton, G., Wong, A and Yang, C.S. 1975. A vector space model for automatic indexing. Communications of the ACM, 18(11)   DOI   ScienceOn
4 Yang, Y., Pedersen J.P. 1997. A Comparative Study on Feature Selection in Text Categorization Proceedings of the Fourteenth International Conference on Machine Learning(ICML' 97)
5 Devore, J.L. 1995. Probability and Statistics for Engineering and the Sciences. Morgan Kaufmann Publishers, Inc., 4th edition
6 Leek, T., Jin, H., Sista, S. and Schwartz, R. 1999. The BBN crosslingual topic detection and tracking system. Proc. of Topic Detection and Tracking (TDT-1999) Workshop
7 Matsumoto, Y., Kitauchi, A., Yamashita, T., Hirano, Y., Matsuda, H., Takaoka, K. and Asahara, M. 2002. Morphological analysis system ChaSen version 2.2.9. Nara Institute of Science and Technology
8 Levow, G- A. and Oard, DW. 2000. Translingual topic detection: applying lessons from the MEI project. Proc. of Topic Detection and Tracking(TDT-2000) Workshop
9 He, D., Park, H-R., Murray, G., Subotin, M. and Oard, DW. 2002. TDT-2002 topic tracking at Maryland: first experiments. Proc. of Topic Detection and Tracking (TDT-2002) Workshop
10 Eichmann, D. 2002. Tracking & detection using entities and noun phrases. Proc. of Topic Detection and Tracking(TDT-2002) Workshop
11 Yang, Y., Zhang, J., Carbonell, J. and Jin, C. Topic-conditioned novelty detection. Proc. of the International Conference on Knowledge Discovery and Data Mining, Edmonton(KDD 2002)
12 Lam, W. and Huang, R. 2002. Link detection for multilingual new for the TDT2002 evaluation. Proc. of Topic Detection and Tracking(TDT-2002) Workshop
13 Fukumoto, F. and Suzuki, Y. 2000. Event tracking based on domain dependency. Proc. of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval   DOI
14 Swan, R. and Allan, J. 2000. Automatic generation of overview timelines. Proc. of 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval(SIGIR 2000)   DOI
15 Carbonell, J., Yang, Y., Brown, R., Zhang, J. and Ma, N. 2002. New event & link detection at CMU for TDT 2002. Proc. of Topic Detection and Tracking (TDT-2002) Evaluations
16 Chen, Y and Chen, H. 2002. NLP and IR approaches to monolingual and multilingual link detection. Proc. of 19th International Conference on Computational Linguistics   DOI
17 Fiscus, J., Doddington, G., Garofolo, J. and Martin, A. 1999. NIST' s 1998 topic detection and tracking evaluation (TDT2). Proc. of DARPA Broadcast News Workshop