Browse > Article
http://dx.doi.org/10.5391/JKIIS.2010.20.3.318

Design and Implementation of Web Crawler Wrappers to Collect User Reviews on Shopping Mall with Various Hierarchical Tree Structure  

Kang, Han-Hoon (세종대학교 컴퓨터공학과)
Yoo, Seong-Joon (세종대학교 컴퓨터공학과)
Han, Dong-Il (세종대학교 컴퓨터공학과)
Publication Information
Journal of the Korean Institute of Intelligent Systems / v.20, no.3, 2010 , pp. 318-325 More about this Journal
Abstract
In this study, the wrapper database description language and model is suggested to collect product reviews from Korean shopping malls with multi-layer structures and are built in a variety of web languages. Above all, the wrapper based web crawlers have the website structure information to bring the exact desired data. The previously suggested wrapper based web crawler can collect HTML documents and the hierarchical structure of the target documents were only 2-3 layers. However, the Korean shopping malls in the study consist of not only HTML documents but also of various web language (JavaScript, Flash, and AJAX), and have a 5-layer hierarchical structure. A web crawler should have information about the review pages in order to visit the pages without visiting any non-review pages. The proposed wrapper contains the location information of review pages. We also propose a language grammar used in describing the location information.
Keywords
wrapper; shopping mall; review; opinion mining;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Claudio Bertoli, Valter Crescenzi, and Paolo Merialdo, “Crawling programs for wrapper-based applications,” In Proc. of IEEE International Conference on Information Reuse and Integration(IRI'08), pp.160-165, 2008
2 Stephen Soderland, Claire Cardie, and Raymond Mooney, "Learning information extraction rules for semi-structured and Free Text," Machine Learning, Vol. 34, No.1-3, pp.233-272, 1999   DOI
3 Hanhoon Kang, Seong Joon Yoo, Dongil Han, “Modeling Web Crawler Wrappers to Collect User Reviews on Shopping Mall with Various Hierarchical Tree Struture,” In Proc. of the International Conference on Web Information Systems and Mining(WISM ’09), pp. 69-73, 2009
4 http://autos.yahoo.com/new_cars.html
5 Bo Pang, Lillian Lee and Shivakumar Vaithyanathan, “Thumbs up? Sentiment Classification using Machine Learning Techniques,” In Proc. of the Conference on Empirical Methods in Natural Language Processing(EMNLP'02), pp.79-86, 2002
6 M. Hu and B. Liu, “Mining and Summarizing Customer Reviews,” In Proc. of ACM SIGKDD ’04, pp.168-177,2004
7 M. Hu and B. Liu, “Mining Opinion Features in Customer Reviews,” In Proc. of the 19th National Conference on Artificial Intelligence(AAAI’04), pp. 755-760, 2004
8 Bing Liu, Web Data Mining : Exploring Hyperlinks, Contents, and Usage Data, Springer, pp. 273-289, 2007
9 S. Chakrabarti,M. van den Berg, and B. Dom, “Focused Crawling : A New Approach to Topic-Specific Web Resource Discovery,” Computer Networks, Vol.31, No. 11-16, pp.1623-1640, 1999   DOI
10 Ziyu Guan, Can Wang, Chun Chen, Jiajun Bu, Junfeng Wang, “Guide Focused Crawler Efficiently and Effectively Using On-line Topical Importance Estimation," In Proc. of ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 757-758, 2008
11 S. Chakrabarti, Mining the Web. Discovering Knowledge from Hypertext Data, Morgan Kaufmamm, pp. 257-287, 2003
12 J. Cho, H. Garcia-Molina, and L. Page, “Efficient Crawling through URL Ordering,” Computer Networks, Vol.30, No.1-7, pp. 161-172, 1998
13 Chia-Hui Chang, Mohammed Kayed, Moheb Ramzy Girgis, and Khaled Shaalan, “A Survey of Web Information Extraction Systems,” IEEE Transaction on Knowledge and Data Engineering, Vol.18, No. 10, pp.1411-1428, 2006   DOI
14 P.Tuerny, “Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews,” In Proc. of the Meeting of the Association for Computational Linguistics(ACL’02), pp.417-424, 2002
15 Jaeyoung Yang, Tae-Hyung Kim, and Joongmin Choi, “An Interface Agent for Wrapper-Based Information Extraction,” In Proc. of the International Confenrence on Principles of Practice in Multi-Agent Systems(PRIMA'04), pp.291-302, 2004