Browse > Article
http://dx.doi.org/10.5909/JBE.2019.24.6.1122

A Study on Adaptive Parallel Computability in Many-Task Computing on Hadoop Framework  

Jik-Soo, Kim (Department of Computer Engineering, Myongji University)
Publication Information
Journal of Broadcast Engineering / v.24, no.6, 2019 , pp. 1122-1133 More about this Journal
Abstract
We have designed and implemented a new data processing framework called MOHA(Mtc On HAdoop) which can effectively support Many-Task Computing(MTC) applications in a YARN-based Hadoop platform. MTC applications can be composed of a very large number of computational tasks ranging from hundreds of thousands to millions of tasks, and each MTC application may have different resource usage patterns. Therefore, we have implemented MOHA-TaskExecutor(a pilot-job that executes real MTC application tasks)'s Adaptive Parallel Computability which can adaptively execute multiple tasks simultaneously, in order to improve the parallel computability of a YARN container and the overall system throughput. We have implemented multi-threaded version of TaskExecutor which can "independently and dynamically" adjust the number of concurrently running tasks, and in order to find the optimal number of concurrent tasks, we have employed Hill-Climbing algorithm.
Keywords
Many-Task Computing; YARN; Hadoop; MOHA; Parallel Computability;
Citations & Related Records
연도 인용수 순위
  • Reference
1 Lu, X., Liang, F., Wang, B., Zha, L., Xu, Z., "DataMPI: extending MPI to Hadoop-like big data computing", Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS '14), May 2014.
2 Xu, L., Li, M., Butt, A.R., "GERBIL: MPI+YARN", Proceedings of the 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 2015.
3 Ye, J., Chow, J.H., Chen, J., Zheng, Z., "Stochastic gradient boosted distributed decision trees", Proceedings of the 18th ACM conference on Information and knowledge management (CIKM'09), Nov. 2009.
4 Kim, J.S., Nguyen, C., Hwang, S., "MOHA: many-task computing meets the big data platform", Proceedings of the IEEE 12th International Conference on eScience (eScience 2016), Oct. 2016.
5 J. Lee, J-W. Choi, J. Shin, K-T. No, "Trends in Computer Aided Drug Discovery, Next Generation", COMMUNICATIONS OF THE KOREA INFORMATION SCIENCE SOCIETY, Vol. 31, No. 8, pp. 35-54, 2013.
6 D. Thain, T. Tannenbaum, and M. Livny, "Distributed computing in practice: the Condor experience", Concurrency and Computation: Practice and Experience, Volume 17, Issue 2-4, pp. 323-356, February. 2005.   DOI
7 B. Bode, D. M. Halstead, R. Kendall, Z. Lei, and D. Jackson, "The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters", Proceedings of the Usenix, Proceedings of the 4th Annual Linux Showcase & Conference, Nov. 2000.
8 IBM Tivoli Workload Scheduler LoadLeveler, Available at http://www03.ibm.com/systems/software/loadleveler/
9 AutoDock Vina: molecular docking and virtual screening program: Available at http://vina.scripps.edu/
10 Ashburn, T.T., Thor, K.B., " Drug repositioning: identifying and developing new uses for existing drugs", Nature Reviews Drug Discovery, Volume 3, Issue 8, pp. 673-683, 2004.   DOI
11 J. Kreps, N. Narkhede, and J. Rao. "Kafka: A distributed messaging system for log processing", Proceedings of NetDB'11, June 2011.
12 Apache Kafka: https://kafka.apache.org/
13 Apache ActiveMQ: http://activemq.apache.org/
14 Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers, http://datasys.cs.iit.edu/events/MTAGS16/
15 W. Gentzsch, "Sun Grid Engine: Towards Creating a Compute Power Grid", Proceedings of the 1st IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2001), May 2001.
16 J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations", Communications of the ACM, Volume 39, Issue 7, pp. 84-90, July 1996.   DOI
17 I. Raicu, I. Foster and Y. Zhao, "Many-Task Computing for Grids and Supercomputers", Proceedings of the IEEE/ACM Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS'08), 2008.
18 I. Raicu, Y. Zhao, C. Dumitrescu, I. Foster, and M. Wilde, "Falkon: a Fast and Light-weight tasK executiON framework," Proceedings of the 2007 ACM/IEEE conference on Supercomputing (SC'07), Nov. 2007.
19 Ioan Raicu et al., "Middleware Support for Many-Task Computing", Cluster Computing, Volume 13 Issue 3, September 2010.
20 Apache Hadoop: https://hadoop.apache.org/
21 Vinod Kumar Vavilapalli et. al., "Apache Hadoop YARN: yet another resource negotiator", Proceedings of the 4th annual Symposium on Cloud Computing (SOCC'13), October 2013.
22 Arun C. Murthy et. al., "Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2", Addison-Wesley, 2014.