Search | Korea Science

Jeong Sam-Jin
- The Journal of the Korea Contents Association
- /
- v.5 no.4
- /
- pp.204-211
- /
- 2005
This paper introduces three loop splitting methods such as minimum dependence distance method, Polychronopoulous' method, and first dependence method for exploiting parallelism from single loop which already developed. And it also Indicates their several problems. We extend the first dependence method which is the most effective one among three loop splitting methods, and propose more powerful loop splitting method to enhance parallelism on single loop. The proposed algorithm solves several problems, such as anti-flow dependence and g=gcd(a,c) > 1, that the first dependence method has.
PDF

Jeong, Sam-Jin
- International Journal of Contents
- /
- v.4 no.4
- /
- pp.1-6
- /
- 2008
In this paper, we propose improved loop splitting method for maximizing parallelism of single loops with non-constant dependence distances. By using the iteration and distance for the source of the first dependence, and by our defined theorems, we present generalized and optimal algorithms for single loops with non-uniform dependences (MPSL). By the extension of the MPSL method, we also apply to exploit parallelism from nested loops with simple subscripts, based on cycle shrinking and loop interchanging method. The algorithms generalize how to transform general single loops with non-uniform dependences as well as nested loops with simple subscripts into parallel loops.
https://doi.org/10.5392/IJoC.2008.4.4.001 인용 PDF

Chung, Ik Joo;Kim, Seung Hi
- The Journal of the Acoustical Society of Korea
- /
- v.36 no.6
- /
- pp.425-435
- /
- 2017
In this paper, we propose optimization methods for speeding the feedforward network of deep neural networks using NEON SIMD (Single Instruction Multiple Data) parallel instructions and multi-core parallelization on the multi-core ARM processor. As the result of the optimization using SIMD parallel instructions, we present the amount of speed improvement and arithmetic precision stage by stage. Through the optimization using SIMD parallel instructions on the single core, we obtain $2.6{\times}$ speedup over the baseline implementation using C compiler. Furthermore, by parallelizing the single core implementation on the multi-core, we obtain $5.7{\times}{\sim}7.7{\times}$ speedup. The results we obtain show the possibility for applying the arithmetic-intensive deep neural network technology to applications on mobile devices.
https://doi.org/10.7776/ASK.2017.36.6.425 인용 PDF KSCI