Browse > Article

Tightly Coupled Integration of Ranking SVM and RDBMS  

Song, Jae-Hwan ((주)LG CNS 기술서비스부문)
Oh, Jin-Oh (포항공과대학교 컴퓨터공학과)
Yang, Eun-Seok (포항공과대학교 컴퓨터공학과)
Yu, Hwan-Jo (포항공과대학교 컴퓨터공학과)
Abstract
Rank learning and processing have gained much attention in the IR and data mining communities for the last decade. While other data mining techniques such as classification and regression have been actively researched to interoperate with RDBMS by using the tightly coupled or loose coupling approaches, ranking has been researched independently without integrating into RDBMS. This paper proposes a tightly coupled integration of the Ranking SVM into MySQL in order to perform the rank learning task efficiently within the RDBMS. We implemented new SQL commands for learning ranking functions and predicting ranking scores. We evaluated our tightly coupled integration of Ranking SVM by comparing it to a loose coupling implementation. The experiment results show that our approach has a performance improvement of $10{\sim}40%$ in the training phase and 60% in the prediction phase.
Keywords
Data mining; Ranking SVM; RDBMS; Tightly coupled integration; DMQL; Ranking SQL;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Rakesh Agrawal, Kyoseok Shim, 'Developing Tightly-Coupled Data Mining Applications on a Relational Database System,' Proc. Knowledge Discovery and Data Mining, 1996
2 Vapnik, 'The Nature of Statistical Learning Theory,' Springer, 1995
3 Zhaohui Tang, Jamie Maclennan, Peter Pyungchul Kim, 'Building data mining solutions with OLE DB for DM and XML for analysis,' ACM SIG-MOD Record, 2005   DOI   ScienceOn
4 Boriana L. Milenova, Joseph S. Yarmus, Marcos M. Campos, 'SVM in Oracle Database 10g: Removing the Barriers to Widespread Adoption of Support Vector Machines,' VLDB 2005
5 Chris Burges, Tal Shaked, Erin Renshaw, Ari Lazier, Matt Deeds, Nicole Hamilton, Greg Hullender, 'Learning to Rank using Gradient Descent,' ACM International Conference Proceeding Series, 2005   DOI
6 Ralf Herbrich, Thore Graepel, and Klaus Obermayer, 'Large margin rank boundaries for ordinal regression,' In Advances in Large Margin Classifiers, MIT Press, Cambridge, MA, 2000
7 Tomasz Imielinski, Aashu Virmani, 'MSQL: A Query Language for Database Mining,' Data Mining and Knowledge Discovery, 1999   DOI   ScienceOn
8 Jin Xu, Hang Li, 'Adarank: A Boosting Algorithm for Information Retrieval,' SIGIR, Annual ACM Conference on Research and Development in Information Retrieval, 2007   DOI
9 http://www.mysql.com
10 http://svmlight.joachim.org/
11 Jiawei Han, Yongjian Fu, Wei Wang, Krzysztof Koperski, Osmar Zaiane, 'DMQL: A data mining query language for relational databases,' Proc. SIGMOD, 1996
12 Yoav Freund, Raj Iyer, Robert E, Schapire, Yoram Singer, 'An Efficient Boosting Algorithm For Combining Preference,' Journal of Machine Learning Research, 2003   DOI   ScienceOn
13 Jiawei Han, Micheline Kamber, 'Data Mining: Concepts and Techniques,' Second Edition, Morgan Kaufmann, 2006
14 Amir Netz, Surajit Chaudhuri, Usama Fayyad, Jeff Bernhardt, 'Integrating Data Mining with SQL Databases: OLE DB for Data Mining,' icde, p. 0379, 17th International Conference on Data Engineering (ICDE'01), 2001