1 |
Ferrara, Emilio, et al. "Web data extraction, applications and techniques: a survey." Knowledge-based systems 70 (2014): 301-323.
DOI
|
2 |
Geng, Hua, Qiang Gao, and Jingui Pan. "Extracting content for news web pages based on DOM." IJCSNS International Journal of Computer Science and Network Security 7.2 (2007): 124-129.
|
3 |
Jonathan Hedley. "Jsoup: Java HTML Parser", https://jsoup.org/
|
4 |
Wang, Jie, et al. "The crawling and analysis of agricultural products big data based on Jsoup." Fuzzy Systems and Knowledge Discovery (FSKD), 2015 12th International Conference on. IEEE, 2015.
|
5 |
Apache Flume, https://flume.apache.org/.
|
6 |
Apache Hadoop, http://hadoop.apache.org (2009).
|
7 |
Borthakur, Dhruba. "HDFS architecture guide." HADOOP APACHE PROJECT http://hadoop.apache.org/common/docs/current/hdfs design.pdf(2008):39.
|
8 |
Dean, Jeffrey, and Sanjay Ghemawat. "MapReduce: simplified data processing on large clusters." Communications of the ACM 51.1 (2008): 107-113.
DOI
|
9 |
Zaharia, Matei, et al. "Spark: Cluster Computing with Working Sets." HotCloud 10 (2010): 10-10.
|
10 |
Gopalani, Satish, and Rohan Arora. "Comparing apache spark and map reduce with performance analysis using K-means." International Journal of Computer Applications 113.1 (2015).
|
11 |
Seung-jun Choi, Jae-Won Park, Jong-Bae Kim and Jae-Hyun Choi, "A Quality Evaluation Model for Distributed Processing Systems of Big Data", Journal of Digital Contents Society, Vol. 15, Issue 4, pp 533-545, 2014
DOI
|