A Study on the DB-IR Integration: Per-Document Basis Online Index Maintenance

  • Jin, Du-Seok (Department of Information Technology Research, Korea Institute of Science and Technology Information) ;
  • Jung, Hoe-Kyung (Department of Computer Engineering, Paichai University)
  • Published : 2009.09.30

Abstract

While database(DB) and information retrieval(IR) have been developed independently, there have been emerging requirements that both data management and efficient text retrieval should be supported simultaneously in an information system such as health care, customer support, XML data management, and digital libraries. The great divide between DB and IR has caused different manners in index maintenance for newly arriving documents. While DB has extended its SQL layer to cope with text fields due to lack of intact mechanism to build IR-like index, IR usually treats a block of new documents as a logical unit of index maintenance since it has no concept of integrity constraint. However, In the DB-IR integrations, a transaction on adding or updating a document should include maintenance of the posting lists accompanied by the document. Although DB-IR integration has been budded in the research filed, the issue will remain difficult and rewarding areas for a while. One of the primary reasons is lack of efficient online transactional index maintenance. In this paper, performance of a few strategies for per-document basis transactional index maintenance - direct index update, pulsing auxiliary index and posting segmentation index - will be evaluated. The result shows that the pulsing auxiliary strategy and posting segmentation indexing scheme, can be a challenging candidates for text field indexing in DB-IR integration.

Keywords

References

  1. S. Amer-Yahia, Djoerd Hiemstra, Thomas Roelleke, Divesh Srivastava, Gerhard Weikum, DB&IR Intrgration: Report on the Dagstuhl Seminar "Ranked XML Querying", SIGMOD Record, Sep. 2008, 37(3). pp. 46-79 https://doi.org/10.1145/1462571.1462584
  2. S. Amer-Yahia, P. Case, J. Shanmugasunddaram, T. Roelleke, and G. Weikum, Report on the DB/IR panel at SIGMOD 2005, 34(4). pp. 71-74 https://doi.org/10.1145/1107499.1107514
  3. H. Bast and I. Weber, The comletesearch engine: Interactive, efficient, and towards ir&db integration,Third Biennial Conference on Innovative Data Systems Research(CIDR 07), Jan. USA, 2007, p.88-95
  4. J. Gray. The next database revolution. In Proceedings of the 2004 ACM SIGMOD international conference on Management of data, 2004, p.1-4 https://doi.org/10.1145/1007568.1007570
  5. J. Zobel, A. Moffat, and K. Ramamohanarao, Inverted files versus signature files for text indexing. ACM Trans. Database Systems, 1998, 23(4), pp.453-490 https://doi.org/10.1145/296854.277632
  6. S. Buttcher, C. L. A. clarke, and B. Lushman, Hybrid index maintenance for growing text collections. In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, 2006, pp.356-363 https://doi.org/10.1145/1148170.1148233
  7. N. Lester, J. Zobel, and H. Williams, Effecient online index maintenance for contiguous inverted lists, Inf. Process. Manage., 2006, 42(4), pp.916-933 https://doi.org/10.1016/j.ipm.2005.09.005
  8. N. Lester, J. Zobel, and H. Williams. In-Place versus Re-Build versus Re-Merge: Index Maintenance Strategies for Text Retrieval Systems. In Computer Science Conference, New Zealand, Jan. 2004, pp.15-22
  9. D. R. Cutting and J. O. Pedersen. Optimization for dynamic inverted index maintenance. In Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval, 1990, pp.405-411 https://doi.org/10.1145/96749.98245
  10. A. Tomasic, H. Garcia-Molina, and K. A. Shoens. Incremental updates of inverted lists for text document retrieval. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data, USA, May 1994, pp.289300 https://doi.org/10.1145/191839.191896
  11. T. Chiueh and L. Huang, Efficient real-time index updates in text retrieval systems, Technical Report ECSL- TR-66, Computer Science Department, SUNY at Stony Brook, 1999
  12. L. Lim, M. Wang, S. Padmanabhan, J. S. Vitter, and R. Agrwal, Dynamic maintenance of web indexes using landmarks. In Proceedings of the 12th international conference on World Wide Web, 2003, p.l02-111 https://doi.org/10.1145/775152.775167
  13. N. Kabra, R. Ramakrishnan, and V. Ercegovac, The QUlQ engine: A hybrid IR-DB system, In Proceedings of the 19th International Conference on Data Engineering (ICDE), India, Mar. 2003, pp.741 https://doi.org/10.1109/ICDE.2003.1260854
  14. L. Guo, J. Shanmugasundaram, K. Beyer, and E. Shekita, Efficient inverted lists and query algorithms for Structured Value Ranking in update-intensive relational databases. In Proceedings of the 2 rt International Conference on Data Engineering, 2005, p.298-309 https://doi.org/10.1109/ICDE.2005.59