Search | Korea Science

Hybrid parallel programming for Heterogeneous Multi-core performance optimization (헤테로지니어스 멀티코어 성능 최적화를 위한 하이브리드 병렬 프로그래밍)

Lim, Ju-Ho
- Proceedings of the Korean Information Science Society Conference
- /
- 2012.06a
- /
- pp.7-9
- /
- 2012
CPU는 싱글 코어 구조에서 클록 속도를 높여 성능을 향상 시키려는 노력을 해왔으나 한계에 도달하자 하나의 칩에 코어를 여러 개 둔 멀티코어 형태로 발전하였다. CPU의 성능 향상을 위해 이제는 3D그래픽을 연산처리하기 위해 만들어진 GPU와 결합하기에 이르렀다. CPU와 GPU의 결합은 CPU간의 결합보다 훨씬 더 좋은 성능을 보였고 전력의 사용량도 더 적었으며 비용면에서도 경제적이라는 장점을 가지고 있다. 본 논문에서는 CPU와 GPU의 Heterogeneous multicore상에서 성능을 최적화하기 위해 기존의 병렬화 모델을 조합하고 최적화를 시도하였다. CPU상에서는 성능 향상을 위해 기존의 병렬 프로그램 모델인 SIMD와 공유메모리 병렬 프로그래밍 모델 그리고 메시지 패싱 병렬 프로그래밍 모델을 조합하는 실험을 했다. GPU에서는 CUDA를 최적화 하였다. 이렇게 CPU와 GPU를 최적화하고 조합하여 고성능 연산을 요구하는 어플리케이션을 위한 Heterogeneous multicore 성능 최적화 방법을 제안한다.

Parallel damage detection through finite frequency changes on multicore processors

Messina, Arcangelo;Cafaro, Massimo
- Structural Engineering and Mechanics
- /
- v.63 no.4
- /
- pp.457-469
- /
- 2017
This manuscript deals with a novel approach aimed at identifying multiple damaged sites in structural components through finite frequency changes. Natural frequencies, meant as a privileged set of modal data, are adopted along with a numerical model of the system. The adoption of finite changes efficiently allows challenging characteristic problems encountered in damage detection techniques such as unexpected comparison of possible shifted modes and the significance of modal data changes very often affected by experimental/environmental noise. The new procedure extends MDLAC and exploits parallel computing on modern multicore processors. Smart filters, aimed at reducing the potential damaged sites, are implemented in order to reduce the computational effort. Several use cases are presented in order to illustrate the potentiality of the new damage detection procedure.
https://doi.org/10.12989/sem.2017.63.4.457 인용 KSCI

Parallelizing H.264 and AES Collectively

Kim, Heegon;Lee, Sungju;Chung, Yongwha;Pan, Sung Bum
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.7 no.9
- /
- pp.2326-2337
- /
- 2013
Many applications can be parallelized by using multicore platforms. We propose a load-balancing technique for parallelizing a whole application, whose first module (H.264) has data independency and whose second module (AES) has data dependency. Instead of distributing the first module symmetrically over the multi-core platform, we distribute the data-independent workload asymmetrically in order to start the data-dependent workload as early as possible. Based on the experimental results with a compression/encryption application, we confirm that the asymmetric load balancing can provide better performance than the typical symmetric load balancing.
https://doi.org/10.3837/tiis.2013.09.015 인용 PDF KSCI

An Implementation and Performance Evaluation of Fast Web Crawler with Python

Kim, Cheong Ghil
- Journal of the Semiconductor & Display Technology
- /
- v.18 no.3
- /
- pp.140-143
- /
- 2019
The Internet has been expanded constantly and greatly such that we are having vast number of web pages with dynamic changes. Especially, the fast development of wireless communication technology and the wide spread of various smart devices enable information being created at speed and changed anywhere, anytime. In this situation, web crawling, also known as web scraping, which is an organized, automated computer system for systematically navigating web pages residing on the web and for automatically searching and indexing information, has been inevitably used broadly in many fields today. This paper aims to implement a prototype web crawler with Python and to improve the execution speed using threads on multicore CPU. The results of the implementation confirmed the operation with crawling reference web sites and the performance improvement by evaluating the execution speed on the different thread configurations on multicore CPU.
PDF KSCI

Performance evaluation of mobile multicore devices on threading in converting JPEG to animated GIF (JPEG을 Animated GIF로 변환하는 과정에서 스레딩에 따른 멀티코어 모바일 디바이스의 성능 평가)

Woo, Hosung;Kim, Kangseok;Kim, Jai-Hoon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2013.05a
- /
- pp.328-331
- /
- 2013
본 논문에서는 멀티코어 모바일 디바이스에서 최적의 스레드 구성을 측정하기 위해 이미지 코덱을 사용하여 다양한 환경에서 스레드 개수에 따른 인코딩 수행시간을 분석하였다. 인코딩은 Quantization을 사용하여 JPEG 파일들을 하나의 GIF 파일로 변환하는 기능을 수행하며, 듀얼코어와 쿼드코어 안에서 각각의 스레드 개수를 늘려가며 측정하였다. 듀얼코어에서는 스레드 4개였을 경우가 성능이 효율적이였으며, 쿼드 코어에서는 스레드 3개였을 경우가 성능이 효율적이였다. 분석 후 결론은 스레드 개수와 성능은 비례하는 것이 아니며 성능에 크게 영향을 미치지 않는 것으로 확인되었다. 코어와 I/O입출력의 성능 및 데이터 크기에 따라 적당한 스레드 개수를 정하여 사용하는 것이 효율적이다.
https://doi.org/10.3745/PKIPS.y2013m05a.328 인용 PDF

ETS: Efficient Task Scheduler for Per-Core DVFS Enabled Multicore Processors

Hong, Jeongkyu
- Journal of information and communication convergence engineering
- /
- v.18 no.4
- /
- pp.222-229
- /
- 2020
Recent multi-core processors for smart devices use per-core dynamic voltage and frequency scaling (DVFS) that enables independent voltage and frequency control of cores. However, because the conventional task scheduler was originally designed for per-core DVFS disabled processors, it cannot effectively utilize the per-core DVFS and simply allocates tasks evenly across all cores to core utilization with the same CPU frequency. Hence, we propose a novel task scheduler to effectively utilize percore DVFS, which enables each core to have the appropriate frequency, thereby improving performance and decreasing energy consumption. The proposed scheduler classifies applications into two types, based on performance-sensitivity and allows a performance-sensitive application to have a dedicated core, which maximizes core utilization. The experimental evaluations with a real off-the-shelf smart device showed that the proposed task scheduler reduced 13.6% of CPU energy (up to 28.3%) and 3.4% of execution time (up to 24.5%) on average, as compared to the conventional task scheduler.
https://doi.org/10.6109/jicce.2020.18.4.222 인용 PDF KSCI

Power-efficient Scheduling of Periodic Real-time Tasks on Lightly Loaded Multicore Processors (저부하 멀티코어 프로세서에서 주기적 실시간 작업들의 저전력 스케쥴링)

Lee, Wan-Yeon
- Journal of the Korea Society of Computer and Information
- /
- v.17 no.8
- /
- pp.11-19
- /
- 2012
In this paper, we propose a power-efficient scheduling scheme for lightly loaded multicore processors which contain more processing cores than running tasks. The proposed scheme activates a portion of available cores and inactivates the other unused cores in order to save power consumption. The tasks are assigned to the activated cores based on a heuristic mechanism for fast task assignment. Each activated core executes its assigned tasks with the optimal clock frequency which minimizes the power consumption of the tasks while meeting their deadlines. Evaluation shows that the proposed scheme saves up to 78% power consumption of the previous method which activates as many processing cores as possible for the execution of the given tasks.
https://doi.org/10.9708/jksci.2012.17.8.011 인용 PDF KSCI

Performance Evaluation and Optimization of Journaling File Systems with Multicores and High-Performance Flash SSDs (멀티코어 및 고성능 플래시 SSD 환경에서 저널링 파일 시스템의 성능 평가 및 최적화)

Han, Hyuck
- The Journal of the Korea Contents Association
- /
- v.18 no.4
- /
- pp.178-185
- /
- 2018
Recently, demands for computer systems with multicore CPUs and high-performance flash-based storage devices (i.e., flash SSD) have rapidly grown in cloud computing, surer-computing, and enterprise storage/database systems. Journaling file systems running on high-performance systems do not exploit the full I/O bandwidth of high-performance SSDs. In this article, we evaluate and analyze the performance of the Linux EXT4 file system with high-performance SSDs and multicore CPUs. The system used in this study has 72 cores and Intel NVMe SSD, and the flash SSD has performance up to 2800/1900 MB/s for sequential read/write operations. Our experimental results show that checkpointing in the EXT4 file system is a major overhead. Furthermore, we optimize the checkpointing procedure and our optimized EXT4 file system shows up to 92% better performance than the original EXT4 file system.
https://doi.org/10.5392/JKCA.2018.18.04.178 인용 PDF KSCI

Tile Partitioning-based HEVC Parallel Decoding Optimization for Asymmetric Multicore Processor (비대칭 멀티코어 시스템 상의 HEVC 병렬 디코딩 최적화를 위한 타일 분할 기법)

Ryu, Yeongil;Roh, Hyun-Joon;Ryu, Eun-Seok
- Journal of KIISE
- /
- v.43 no.9
- /
- pp.1060-1065
- /
- 2016
Recently, there is an emerging need for parallel UHD video processing, and the usage of computing systems that have an asymmetric processor such as ARM big.LITTLE is actively increasing. Thus, a new parallel UHD video processing method that is optimized for the asymmetric multicore systems is needed. This paper proposes a novel HEVC tile partitioning method for parallel processing by analyzing the computational power of asymmetric multicores. The proposed method analyzes (1) the computing power of asymmetric multicores and (2) the regression model of computational complexity per video resolution. Finally, the model (3) determines the optimal HEVC tile resolution for each core and partitions/allocates the tiles to suitable cores. The proposed method minimizes the gap in the decoding time between the fastest CPU core and the slowest CPU core. Experimental results with the 4K UHD official test sequences show average 20% improvement in the decoding speedup on the ARM asymmetric multicore system.
https://doi.org/10.5626/JOK.2016.43.9.1060 인용 KSCI

Probabilistic Power-saving Scheduling of a Real-time Parallel Task on Discrete DVFS-enabled Multi-core Processors (이산적 DVFS 멀티코어 프로세서 상에서 실시간 병렬 작업을 위한 확률적 저전력 스케쥴링)

Lee, Wan Yeon
- Journal of the Korea Society of Computer and Information
- /
- v.18 no.2
- /
- pp.31-39
- /
- 2013
In this paper, we propose a power-efficient scheduling scheme that stochastically minimizes the power consumption of a real-time parallel task while meeting the deadline on multicore processors. The proposed scheme applies the parallel processing that executes a task on multiple cores concurrently, and activates a part of all available cores with unused cores powered off, in order to save power consumption. It is proved that the proposed scheme minimizes the mean power consumption of a real-time parallel task with probabilistic computation amount on DVFS-enabled multicore processors with a finite set of discrete clock frequencies. Evaluation shows that the proposed scheme saves up to 81% power consumption of the previous method.
https://doi.org/10.9708/jksci.2013.18.2.031 인용 PDF KSCI

Search Result 143, Processing Time 0.025 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)