Search | Korea Science

Analysis of GPU Performance and Memory Efficiency according to Task Processing Units (작업 처리 단위 변화에 따른 GPU 성능과 메모리 접근 시간의 관계 분석)

Son, Dong Oh;Sim, Gyu Yeon;Kim, Cheol Hong
- Smart Media Journal
- /
- v.4 no.4
- /
- pp.56-63
- /
- 2015
Modern GPU can execute mass parallel computation by exploiting many GPU core. GPGPU architecture, which is one of approaches exploiting outstanding computational resources on GPU, executes general-purpose applications as well as graphics applications, effectively. In this paper, we investigate the impact of memory-efficiency and performance according to number of CTAs(Cooperative Thread Array) on a SM(Streaming Multiprocessors), since the analysis of relation between number of CTA on a SM and them provides inspiration for researchers who study the GPU to improve the performance. Our simulation results show that almost benchmarks increasing the number of CTAs on a SM improve the performance. On the other hand, some benchmarks cannot provide performance improvement. This is because the number of CTAs generated from same kernel is a little or the number of CTAs executed simultaneously is not enough. To precisely classify the analysis of performance according to number of CTA on a SM, we also analyze the relations between performance and memory stall, dram stall due to the interconnect congestion, pipeline stall at the memory stage. We expect that our analysis results help the study to improve the parallelism and memory-efficiency on GPGPU architecture.
PDF KSCI

An Optimization Method for Hologram Generation on Multiple GPU-based Parallel Processing (다중 GPU기반 홀로그램 생성을 위한 병렬처리 성능 최적화 기법)

Kook, Joongjin
- Smart Media Journal
- /
- v.8 no.2
- /
- pp.9-15
- /
- 2019
Since the computational complexity for hologram generation increases exponentially with respect to the size of the point cloud, parallel processing using CUDA and/or OpenCL library based on multiple GPUs has recently become popular. The CUDA kernel for parallelization needs to consist of threads, blocks, and grids properly in accordance with the number of cores and the memory size in the GPU. In addition, in case of multiple GPU environments, the distribution in grid-by-grid, in block-by-block, or in thread-by-thread is needed according to the number of GPUs. In order to evaluate the performance of CGH generation, we compared the computational speed in CPU, in single GPU, and in multi-GPU environments by gradually increasing the number of points in a point cloud from 10 to 1,000,000. We also present a memory structure design and a calculation method required in the CUDA-based parallel processing to accelerate the CGH (Computer Generated Hologram) generation operation in multiple GPU environments.
https://doi.org/10.30693/SMJ.2019.8.2.09 인용 PDF KSCI

A Model for Reducing Priority Inversion in Real Time Server System (실시간 서버 시스템에서 우선 순위 반전현상을 감소하기 위한 모델)

Choe, Dae-Su;Im, Jong-Gyu;Gu, Yong-Wan
- The Transactions of the Korea Information Processing Society
- /
- v.6 no.11
- /
- pp.3131-3139
- /
- 1999
Satisfying the rigid timing requirements of various real-time activities in real-time systems often requires some special methods to tune the systems run-time behaviors. Unbounded blocking can be caused when a high priority activity cannot preempt a low priority activity. In such situation, it is said that a priority inversion has occurred. The priority inversion is one of the problems which may prevent threads from meeting the deadlines in the real-time systems. It is difficult to remove such priority inversion problems in the kernel at the same time to bound the worst case blocking time for the threads. A thread is a piece of executable code which has access to data and stack. In this paper, a new real-time systems. It is difficult to remove such priority inversion problems in the kernel at the same time to bound the worst case blocking time for the threads. A threads is a piece of executable code which has access to data and stack. In this paper, a new real-time server model, which minimizes the duration of priority inversion, is proposed to reduce the priority inversion problem. The proposed server model provides a framework for building a better server structure, which can not only minimize the duration of the priority inversion, but also reduce the deadline miss ratio of higher priority threads.
PDF

Real-Time Characteristics Analysis and Improvement for OPRoS Component Scheduler on Windows NT Operating System (Windows NT상에서의 OPRoS 컴포넌트 스케줄러의 실시간성 분석 및 개선)

Lee, Dong-Su;Ahn, Hee-June
- Journal of Institute of Control, Robotics and Systems
- /
- v.17 no.1
- /
- pp.38-46
- /
- 2011
The OPRoS (Open Platform for Robotic Service) framework provides uniform operating environment for service robots. As an OPRoS-based service robot has to support real-time as well as non-real-time applications, application of Windows NT kernel based operating system can be restrictive. On the other hand, various benefits such as rich library and device support and abundant developer pool can be enjoyed when service robots are built on Windows NT. The paper presents a user-mode component scheduler of OPRoS, which can provide near real-time scheduling service on Windows NT based on the restricted real-time features of Windows NT kernel. The component scheduler thread with the highest real-time priority in Windows NT system acquires CPU control. And then the component scheduler suspends and resumes each periodic component executors based on its priority and precedence dependency so that the component executors are scheduled in the preemptive manner. We show experiment analysis on the performance limitations of the proposed scheduling technique. The analysis and experimental results show that the proposed scheduler guarantees highly reliable timing down to the resolution of 10ms.
https://doi.org/10.5302/J.ICROS.2011.17.1.38 인용 PDF KSCI

Implementation of Dual-Kernel based Control System and Evaluation of Real-time Control Performance for Intelligent Robots (지능형 로봇을 위한 이중 커널 구조의 제어 시스템 구현 및 실시간 제어 성능 분석)

Park, Jeong-Ho;Yi, Soo-Yeong;Choi, Byoung-Wook
- Journal of Institute of Control, Robotics and Systems
- /
- v.14 no.11
- /
- pp.1117-1123
- /
- 2008
This paper implements dual-kernel system using standard Linux and real-time embedded Linux for the real-time control of intelligent robot systems. Such system provides more useful services including standard Linux thread that is easy to implement complicated tasks and real-time tasks for the deterministic response to velocity control. Here, an open source real-time embedded Linux, XENOMAI, is ported on embedded target board. And for interfacing with motor controller we adopted a real-time serial device driver. The real-time task was implemented with a priority to keep the cyclic control command for trajectory control. In order to validate deterministic response of the proposed system, the performance measurement of the delay in performing trajectory control with feedback loop is evaluated with non real-time standard Linux. The proposed software architecture is anticipated to take advantage of features in both standard Linux and real-time operating systems for the intelligent robot systems.
https://doi.org/10.5302/J.ICROS.2008.14.11.1117 인용 PDF KSCI

A Modelling of Kernel-Thread Web Accelerator Using Coloured Petri-Net (Coloured Petri-Net을 이용한 커널 스레드 웹 가속기 모델링)

Hwang, June;Min, Byung-Jo;Kim, Hag-Bae
- Proceedings of the Korea Information Processing Society Conference
- /
- 2003.11a
- /
- pp.385-388
- /
- 2003
인터넷 인구의 폭주로 트래픽을 줄이려는 노력이 일고 있는 가운데, 우리는 기존에 웹 서버의 성능를 리눅스 커널 단에서 향상시키는 방법을 제안하였다. 사용자 영역의 스레드를 사용하지 않고 커널 영역의 스레드를 사용하고 CPU 개수와 동일한 스레드만 사용함으로서 스레드간의 컨텍스트 스위칭과 시스템 콜 사용의 오버헤드를 줄일수 있었다. 이번에는 이 구조를 CPN(Coloured Petri-Net)을 사용하여 논리적으로 분해하고 그 동작특성을 이해하여 모델링과 실제 데이터를 비교함으로서 프로그램 동작에 관한 논리적 문제점을 발견할 수 있고, 동작특성의 시간특성을 알 수 있다.
PDF

A GUI-based Management Tool for Wireless Sensor Networks (GUI 기반의 센서 네트워크 관리 도구)

Jung, In-Uk;Cha, Ho-Jung
- Proceedings of the Korean Information Science Society Conference
- /
- 2006.10b
- /
- pp.62-65
- /
- 2006
센서 네트워크는 라디오 통신으로 인해 생기는 예상치 못한 문제점을 비롯하여 제한적인 리소스 때문에 응용 개발에 어려움이 많다. 센서 네트워크가 구성된 후의 내부적인 데이터 흐름이나 네트워크에 참여하는 각 노드의 상태는 네트워크의 정보 수집을 통해 알 수 있다. 본 논문은 네트워크의 상태를 모니터링하고 실시간으로 네트워크 파라미터들을 효율적으로 설정할 수 있는 센서 네트워크 관리 시스템을 제안한다. Sensor Network Manager는 Multi-thread와 동적으로 할당 가능한 Kernel Module 기법을 지원하는 RETOS 운영체제를 기반으로 개발 되었다. 간단한 플러딩 기법을 적용하여 센서 네트워크의 상태를 모니터링 한 결과 Sensor Network Manager가 개발자에게 전반적인 네트워크에 대한 통제와 모니터링이 가능하다는 것을 검증하였다.
PDF

Developement of a Web Accelerator in the Kernel (커널레벨에서의 웹가속기 개발에 대한 연구)

Ko, Soung-Jun;Park, Jyoung-Gue;Min, Byung-Jo;Kim, Hag-Bae
- Proceedings of the KIEE Conference
- /
- 2001.07d
- /
- pp.2672-2674
- /
- 2001
웹가속기(web accelerator)는 웹서버와 동일한 컴퓨터에서 동작하여 웹서버의 서비스 속도를 향상시키는 것을 주된 목적으로 한다. 이는 캐싱서버와는 달리 별도의 머신을 필요로 하지 않고 커널레벨에서 클라이언트의 요청을 가로채어 처리하므로 일반적인 Apache와 같은 사용자 응용 프로그램보다 더 빠르게 서비스를 수행할 수 있다. 정적페이지(static page)와 동적페이지(dynamic page)를 처리할 수 있고, 커널레벨에서 동작하므로 일반적인 웹서버가 지니는 Multi-thread 구조의 속도상 overhead와 메모리 복사와 디스크 접근에서 일어나는 자원 낭비를 줄임으로써, 웹서버의 응답속도(response time) 및 초당처리요청수(request/sec)를 개선시킬 수 있다.
PDF

A study on Secure Socket Layer WEB Acceleration using Linux Kernel Thread (리눅스 커널에서 구현한 웹서버 암호화 가속 기법에 대한 연구)

Hwang, Jun;Min, Byung-Jo;Nahm, Eui-Seok;Kim, Hag-Bae;Chang, Whie
- Proceedings of the Korea Information Processing Society Conference
- /
- 2002.11a
- /
- pp.489-492
- /
- 2002
인터넷 전자 상거래의 폭발적 증가와 더불어 개인 및 기업의 정보가 온라인 상으로 유출되는 경우가 증가하고 있다. 이에 따라. 새로운 하드웨어의 추가 없이 프로토콜 및 알고리즘의 변화에 유연한 인터넷 보안방법이 요구되고 있다. 본 논문에서는 사용자 영역과 상관없는 커널 스레드를 사용하고 커널 영역으로 포팅된 라이브러리를 참조하여 사용자의 웹 페이지 요청을 처리함으로써 응답시간과 서버 부하를 감소시키는 새로운 SSL(Secure Socket Layer) 처리 구조를 제안한다.
PDF

Design and Implementation of Scalable VOD System on Linux (Linux상에서 확장 가능한 VOD시스템의 설계 및 구현)

김정원;김인환;정기동
- Journal of Korea Multimedia Society
- /
- v.2 no.3
- /
- pp.265-276
- /
- 1999
Video on Demand (VOD) system is definitely one of main applications in upcoming multimedia era. In this research, we have designed and implemented a host-based scalable VOD system (SVOD) which is composed of low cost PC servers and runs on Linux kernel that is currently spotlighted in enterprise and research domains. Our contribution is as follows: first, the previous Ext2 file system was modified to efficiently support continuous media like MPEG stream. Second, the storage server features a host-based scalable architecture. Third, a software MPEG decoder was implemented using Microsoft's DirectShow$\circledR$COM. Finally, flow control between client and server is provided to suppress overflow and underflow of client circular buffer and supports FF VCR operation. We have known that it is possible to develop a thread-based and scalable VOD system on low cost PC servers and free Linux kernel.
PDF

Search Result 27, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)