DOI QR코드

DOI QR Code

Passage Retrieval based on Tracing Topic Continuity and Transition by Using Field-Associated Term

분야연상어를 이용한 화제의 계속성과 전환성을 추적하는 단락분할 방법

  • 이상곤 (전주대학교 정보기술컴퓨터공학부)
  • Published : 2003.02.01

Abstract

We propose a technique to extract a relevant passage from text collection based on field-associated terms since they tries to concentrate relevant text to users query. Documents are supposed to be managed as a whole without any segmentation into small pieces, but the method presented is independent upon any text-embedded auxiliary information, and is based on topic continuity and transition. For users needs-relative sentences or passages, we present a passage retrieval techniques by using occurrence frequency of a field-associated term to delimit text, that is likely to be relevant to a particular topic, considering continuity and transition within topic flowing in text. We evaluate 50 Japanese documents and verify the usefulness with 82% for average precision and 63% for recall.

복수의 화제가 혼합되어 있는 문서에서 각 화제의 경계부분을 구분하여 결정하는 기술을 단락분할이라 한다. 이 기술은 정보검색의 분야에만 한정되지 않고 다양한 분야에서 중요한 역할을 담당할 기술이다. 잘 정의된 분야체계에 따라 구축된 분야연상어를 이용하여 단락분할을 시도한다. 분야연상어란 특정한 분야를 정확하게 연상할 수 있는 단어로서 잘 분류된 문서 컬렉션에서 구축할 수 있다. 이 분야연상어를 이용하여 문서를 관련된 분야별로 추출하여 의미기반 단락추출 방법을 제안한다. 화제의 계속성에 주목하여 분야연상어의 수준(범위)이나 연속출현성에 의해 계산된 계속도에 의해 화제의 실마리를 추적하고, 화제의 전환성을 고려한 방법을 제안한다. 문서 내 각 화제의 단락구분을 명확히 하여, 단락을 화제분야별로 추출하는 방법을 제안한다. 일본어 50문서를 실험한 결과 82%의 정확율과 63%의 재현율을 얻어 실용성을 기대할 수 있었고, 한국어에 적용하여도 좋을 것으로 예상한다.

Keywords

References

  1. Aho, A. V., & Corasick, M. J., 'Efficient String Matching : An Aid to Bibliographic Search,' Communications of the ACM, Vol.18, No.6, pp.333- 340, 1975 https://doi.org/10.1145/360825.360855
  2. Allan, J., 'Relevance Feedback with Too Much Data,' Paper Presented at the Proceedings of the 18th Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SIGIR '95), 1995 https://doi.org/10.1145/215206.215380
  3. Callan, J. P., 'Passage-Level Evidence in Document Retrieval,' Paper Presented at the Proceeding of 17th Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval Research (SIGIR '94), 1994
  4. Cormack, G. V., Clarke, C. L. A., Palmer, C. R., & To, S. S. L., Passage-Based Refinement (Multi Text Experie ments for TREC-6), Paper Presented at the Sixth Text REtrieval Conference (TREC-6), 1997
  5. Daniels, J. J., & Rissland, E. L., 'Locating Passages Using a Case-Base of Excerpts,' Conference on Information and Knowledge Management, Paper Presented at the Proceedings of the 1998 ACM 7th International Conference on Information and Knowledge Management, 1998 https://doi.org/10.1145/288627.288639
  6. Dozawa, T., (Editor) 'Innovative Multi-Information Dictionary, Imidas '99,' Annual Series, Zueisha Publication Co., 1999, (in Japanese)
  7. Fukcta, M., Lee, S., Tsuji, T., Okada, M., & Aoc, J., 'A Document Classification Method by Using Field Association Words,' An International Journal of Information Sciences, Elsevier Science, Vol.126, No 1-4, pp.57-70, 2000 https://doi.org/10.1016/S0020-0255(00)00042-6
  8. Hearst, M. A., & PIaunt, C., 'Subtopic Structuring for Full-Length Document Access,' Paper Presented at the Proceedings of 16th Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval Research (SIGIR '93), 1993 https://doi.org/10.1145/160688.160695
  9. Hess, M., 'Deduction over Mixed-Level Logic Representations for Text Passage Retrieval,' Paper Presented at the Proceedings of the 1996 International Conference on Tools with Artificial Intelligence (TAl '96), 1996
  10. Hoenkamp, E., & Groot, R., 'Finding Relevant Passages Using Noun-Noun Compounds : Coherence vs. Proximity,' Paper Presented at the Twenty-Third Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval (SlGIR 2000), 2000 https://doi.org/10.1145/345508.345667
  11. Iwayarna, M., & Tokunaga, T., 'Probabilistic Passage Categorization and its Application,' Journal of Natural Language Processing, Vol.6, No.3, pp.181-198, 1999, (in Japanese) https://doi.org/10.5715/jnlp.6.3_181
  12. Kaszkiel, M., & Zobel, J., 'Passage Retrieval Revisited,' Paper Presented at the Proceeding of 20th Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval Research (SIGIR '97), 1997 https://doi.org/10.1145/258525.258561
  13. Kaszkiel, M., Zobel, J., & Sacks-Davis, R., 'Efficient Passage Ranking for Document Databases,' ACM Transactions on Information Systems, Vol.17, No.4, pp.406-439, 1999 https://doi.org/10.1145/326440.326445
  14. Knaus, D., Mittendorf, E., & Schauble, P., 'Improving a Basic Retrieval Method by Links and Passage Level Evidence,' Paper Presented at the Third Text REtrieval Conference (TREC-3), 1994
  15. Knaus, D., Mittendorf, E., Schauble, P., & Sheridan, P., 'Highlighting Relevant Passages for Users of the Interactive SPIDER Retrieval System,' Paper Presented at the Fourth Text REtrieval Conference (TREC-4), 1995
  16. Kurohashi, S., Shiraki, N., & Nagao, M., 'A Method for Detecting Important of a Word Based on Its Density Distribution in Text,' Paper Presented at the Transactions of Information Processing Society of Japan, Vol.38, No.4, pp.845-854, 1997, (in japanese)
  17. Lee, S., Koyama, M., Mizobuchi, S., Uchibayashi, K., Kawano, F., Komatsu, T., & Aoe, J., 'Cross-Language Multi-Media Information Retrieval System : BOSS,' Paper Presented at the 18th International Conference on Computer Processing of Oriental Languages (ICCPOL '99), 1999
  18. Melucci, M., 'Passage Retrieval : A Probabilistic Technique, An International Journal of Information Processing and Management,' Vol.34, No.1, pp.43-63, 1998 https://doi.org/10.1016/S0306-4573(97)00047-2
  19. Mittendorf, E., & Schauble, P., 'Document and Passage Retrieval based on Hidden Markov Models,' Paper Presented at the Proceeding of 17th Annual International ACM Special Interest Group on Information Retrieval Conference on Research and Development in Information Retrieval Research (SIGIR 94), 1994
  20. Mizuno, H., Kisc, K., & Matsumoto, K., 'Linking Figures and Tables to Their Expository Texts Using Word Density Distributions and Their Biases, Paper Presented at the Transactions of Information Processing Society of Japan, Vol.40, No.12, pp.4400-4403, 1999, (in Japanese)
  21. Mochizuki, H., Makoto, I., & Okumura, M., 'Passage Level Document Retrieval Using Lexical Chains. Journal of Natural Language Processing,' Vol.6, No.3, pp.101-126, 1999, (in Japanese) https://doi.org/10.5715/jnlp.6.3_101
  22. O'Connor, J., 'Retrieval of Answer-Sentences and Answer- Figures from Papers by Text Searching,' An lnternational Journal of Information Processing & Management, Vol.11l, No.5/7, PP.155-164, 1975 https://doi.org/10.1016/0306-4573(75)90004-7
  23. Tsuji, T., Nigazawa, H., Okada, M., & Aoe, J., 'Early Field Recognition by Using Field Association Words,' Paper Presented at the Proceedings of the 18th International Conference on Computer Processing of Oriental Language (ICCPOL '99), 1999
  24. Yasutake, M., Koyama, Y., Yoshimura, K., & Shudo, K., 'Kana-to-Kanji Conversion Systems Based on Large Scale Collocation Data,