과제정보
연구 과제 주관 기관 : 중소기업청
참고문헌
- S. Gupta, G. E. Kaiser, P. Grimm, M. F. Chiang, J. Starren, "Automating Content Extractionof HTML Documents," World Wide Web, Vol. 8, No. 2, pp. 179 -224, Jun. 2005. https://doi.org/10.1007/s11280-004-4873-3
- A. Finn, N. Kushmerick, B. Smyth, "Fact or fiction: Content classification for digital libraries," presented at the Joint DELOS-NSF Workshop on Personalisation and Recommender Systems in Digital Libraries, Dublin, 2001.
- D. Pinto et al., "QuASM: a system for question answering using semi-structured data," Proc. of the 2nc ACM/IEEE-CS joint conference on Digital libraries, pp. 46-55, 2002.
- S. Debnath, P. Mitra, and C. L. Giles, "Automatic extraction of informative blocks from webpages," p. 1722, 2005.
- T. Gottron, "Combining content extraction heuristics: the CombinE system," p. 591, 2008.
- R. Palacios, Eatiht. 2015.
- S. Wu, J. Liu, J. Fan, "Automatic Web Content Extraction by Combination of Learning and Grouping," pp. 1264-1274, 2015.
- W. Song, W. Kim, and M. Kim, "Content extraction from HTML documents using text block context," Journal of KIISE: Software and Applications, Vol. 40, No. 3, pp. 155-163, 2013.
- T. Weninger, P. Rodrigo, V. Crescenzi, T. Gottron, P. Merialdo, "Web Content Extraction - a metaanalysis of its past and thoughts on its future," [Online]. Available: https://arxiv.org/abs/1508.04066.
- C. Kohlschutter, P. Fankhauser, W. Nejdl, "Boilerplate detection using shallow text features," p. 441, 2010.
- J. H. Friedman, "Greedy function approximation: A gradient boosting machine," The Annals of Statistics, Vol. 29, No. 5, pp. 1189-1232, 2001. https://doi.org/10.1214/aos/1013203451
- R. E. Schapire, "The strength of weak learnability," Journal of Machine Learning," Vol. 5, No. 2, pp. 197-227, 1990.
- T. Chen and C. Guestrin, "XGBoost: A scalable tree boosting system," Proc. of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785-794, 2016.