1 |
dplyr (nd). dplyr: A grammar of data manipulation. https://github.com/hadley/dplyr. Accessed on: 2016-08-27.
|
2 |
H2O.ai (nda). H2O.ai - AI for Business. http://www.h2o.ai/. Accessed on: 2016-08-30.
|
3 |
H2O.ai (ndb). Sparkling Water. http://www.h2o.ai/product/sparkling-water/. Accessed on: 2016-08-30.
|
4 |
HBase (nd). Apache HBase. https://hbase.apache.org. Accessed on: 2016-08-27.
|
5 |
Hindman, B., Konwinski, A., Zaharia, M., Ghodsi, A., Joseph, A. D., Katz, R. H., Shenker, S., and Stoica, I. (2011). Mesos: A platform for fine-grained resource sharing in the data center. In Proceedings of the 13th USENIX conference on Networked Systems Design and Implementation. USENIX Association.
|
6 |
Hunter, T., Moldovan, T., Zaharia, M., Merzgui, S., Ma, J., Franklin, M. J., Abbeel, P., and Bayen, A. M. (2011). Scaling the mobile millennium system in the cloud. In Proceedings of the 2nd ACM Symposium on Cloud Computing. ACM.
|
7 |
Kim, H., Park, J., Jang, J., and Yoon, S. (2016). DeepSpark: Spark-based deep learning supporting asynchronous updates and Caffe compatibility. arXiv preprint arXiv:1602.08191.
|
8 |
Kraska, T., Talwalkar, A., Duchi, J. C., Griffith, R., Franklin, M. J., and Jordan, M. I. (2013). MLbase: A distributed machine-learning system. In The 6th biennial Conference on Innovative Data Systems Research.
|
9 |
Lakshman, A. and Malik, P. (2010). Cassandra: a decentralized structured storage system. ACM SIGOPS Operating Systems Review, 44, 35-40.
|
10 |
Lehoucq, R. B., Sorensen, D. C., and Yang, C. (1998). ARPACK users' guide: solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods, 6, SIAM.
|
11 |
Meng, X., Bradley, J., Yuvaz, B., Sparks, E., Venkataraman, S., Liu, D., et al. (2016). MLlib: Machine learning in apache spark. Journal of Machine Learning Research, 17, 1-7.
|
12 |
Zaharia, M., Chowdhury, M., Das, T., Dave, A., Ma, J., McCauley, M., Franklin, M. J., Shenker, S., and Stoica, I. (2012). Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing. In Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation. USENIX Association.
|
13 |
TopicModeling (nd). Topic modeling on Apache Spark. https://github.com/intel-analytics/TopicModeling. Accessed on: 2016-08-30.
|
14 |
Vavilapalli, V. K., Murthy, A. C., Douglas, C., Agarwal, S., Konar, M., Evans, R., et al. (2013). Apache Hadoop YARN: Yet another resource negotiator. In Proceedings of the 4th annual Symposium on Cloud Computing, ACM.
|
15 |
Xin, R., Deyhim, P., Ghodsi, A., Meng, X., and Zaharia, M. (2014a). GraySort on Apache Spark by Databricks. GraySort Competition.
|
16 |
Xin, R. S., Crankshaw, D., Dave, A., Gonzalez, J. E., Franklin, M. J., and Stoica, I. (2014b). GraphX: Unifying data-parallel and graph-parallel analytics. arXiv preprint arXiv:1402.2394.
|
17 |
Zadeh, R. B., Meng, X., Ulanov, A., Yavuz, B., Pu, L., Venkataraman, S., Sparks, E., Staple, A., and Zaharia, M. (2016). Matrix computations and optimization in Apache Spark. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 31-38), ACM.
|
18 |
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. (2010). Spark: cluster computing with working sets. In Proceedings of the 2nd USENIX conference on Hot topics in cloud computing. USENIX Association.
|
19 |
Zaharia, M., Das, T., Li, H., Hunter, T., Shenker, S., and Stoica, I. (2013). Discretized streams: Fault-tolerant streaming computation at scale. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles (pp. 423-438), ACM.
|
20 |
Zeppelin (nd). Apache Zeppelin. https://zeppelin.apache.org/. Accessed on: 2016-08-30.
|
21 |
Armbrust, M., Xin, R. S., Lian, C., Huai, Y., Liu, D., Bradley, J. K., et al. (2015). Spark SQL: Relational data processing in Spark. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data, pages 1383-1394. ACM.
|
22 |
Bahmani, B., Moseley, B., Vattani, A., Kumar, R., and Vassilvitskii, S. (2012). Scalable k-means++. In Proceedings of the VLDB Endowment, 5, 622-633.
|
23 |
Dean, J. and Ghemawat, S. (2008). MapReduce: simplified data processing on large clusters. Communications of the ACM, 51, 107-113.
|
24 |
Scala (nd). The Scala programming language. http://www.scala-lang.org. Accessed on: 2016-08-27.
|
25 |
Moritz, P., Nishihara, R., Stoica, I., and Jordan, M. I. (2015). SparkNet: Training deep networks in Spark. arXiv preprint arXiv:1511.06051.
|
26 |
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., et al. (2011). Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12, 2825-2830.
|
27 |
RStudio (nd). sparklyr-R interface for Apache Spark. http://spark.rstudio.com. Accessed on: 2016-08-27.
|
28 |
Shvachko, K., Kuang, H., Radia, S., and Chansler, R. (2010). The Hadoop distributed file system. In 2010 IEEE 26th symposium on mass storage systems and technologies (MSST), pages 1-10. IEEE.
|
29 |
Spark (nd). Apache spark. https://spark.apache.org/. Accessed on: 2016-08-27.
|
30 |
Spark-cassandra-connector (nd). Spark Cassandra Connector. https://github.com/datastax/spark-cassandraconnector. Accessed on: 2016-08-30.
|
31 |
Spark-sklearn (nd). Scikit-learn integration package for Apache Spark. https://github.com/databricks/sparksklearn. Accessed on: 2016-08-30.
|
32 |
Spark-tfocs (nd). TFOCS for Spark: A community port of TFOCS for Apache Spark. https://github.com/databricks/spark-tfocs. Accessed on: 2016-08-27.
|
33 |
Spark Wiki (nda). Committers. https://cwiki.apache.org/confluence/display/SPARK/Committers. Accessed on: 2016-08-27.
|
34 |
Spark Wiki (ndb). Powered By Spark. https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark. Accessed on: 2016-08-27.
|
35 |
SparkR (nd). SparkR (R on spark). https://spark.apache.org/docs/latest/sparkr.html. Accessed on: 2016-08-27.
|
36 |
Sparkit-learn (nd). Sparkit-learn. https://github.com/lensacom/sparkit-learn. Accessed on: 2016-08-30.
|