Search | Korea Science

A new warp scheduling technique for improving the performance of GPUs by utilizing MSHR information (GPU 성능 향상을 위한 MSHR 정보 기반 워프 스케줄링 기법)

Kim, Gwang Bok;Kim, Jong Myon;Kim, Cheol Hong
- The Journal of Korean Institute of Next Generation Computing
- /
- v.13 no.3
- /
- pp.72-83
- /
- 2017
GPUs can provide high throughput with latency hiding by executing many warps in parallel. MSHR(Miss Status Holding Registers) for L1 data cache tracks cache miss requests until required data is serviced from lower level memory. In recent GPUs, excessive requests for cache resources cause underutilization problem of GPU resources due to cache resource reservation fails. In this paper, we propose a new warp scheduling technique to reduce stall cycles under MSHR resource shortage. Cache miss rates for each warp is predicted based on the observation that each warp shows similar cache miss rates for long period. The warps showing low miss rates or computation-intensive warps are given high priority to be issued when MSHR is full status. Our proposal improves GPU performance by utilizing cache resource more efficiently based on cache miss rate prediction and monitoring the MSHR entries. According to our experimental results, reservation fail cycles can be reduced by 25.7% and IPC is increased by 6.2% with the proposed scheduling technique compared to loose round robin scheduler.

Hybrid Scheme of Data Cache Design for Reducing Energy Consumption in High Performance Embedded Processor (고성능 내장형 프로세서의 에너지 소비 감소를 위한 데이타 캐쉬 통합 설계 방법)

Shim, Sung-Hoon;Kim, Cheol-Hong;Jhang, Seong-Tae;Jhon, Chu-Shik
- Journal of KIISE:Computer Systems and Theory
- /
- v.33 no.3
- /
- pp.166-177
- /
- 2006
The cache size tends to grow in the embedded processor as technology scales to smaller transistors and lower supply voltages. However, larger cache size demands more energy. Accordingly, the ratio of the cache energy consumption to the total processor energy is growing. Many cache energy schemes have been proposed for reducing the cache energy consumption. However, these previous schemes are concerned with one side for reducing the cache energy consumption, dynamic cache energy only, or static cache energy only. In this paper, we propose a hybrid scheme for reducing dynamic and static cache energy, simultaneously. for this hybrid scheme, we adopt two existing techniques to reduce static cache energy consumption, drowsy cache technique, and to reduce dynamic cache energy consumption, way-prediction technique. Additionally, we propose a early wake-up technique based on program counter to reduce penalty caused by applying drowsy cache technique. We focus on level 1 data cache. The hybrid scheme can reduce static and dynamic cache energy consumption simultaneously, furthermore our early wake-up scheme can reduce extra program execution cycles caused by applying the hybrid scheme.
PDF KSCI

Acceleration of LU-SGS Code on Latest Microprocessors Considering the Increase of Level 2 Cache Hit-Rate (최신 마이크로프로세서에서 2차 캐쉬 적중률 증가를 고려한 LU-SGS 코드의 가속)

Choi, J.Y.;Oh, Se-Jong
- Journal of the Korean Society for Aeronautical & Space Sciences
- /
- v.30 no.7
- /
- pp.68-80
- /
- 2002
An approach for composing a performance optimized computational code is suggested for latest microprocessors. The concept of the code optimization, called here as localization, is maximizing the utilization of the second level cache that is common to all the latest computer system, and minimizing the access to system main memory. In this study, the localized optimization of LU-SGS (Lower-Upper Symmetric Gauss-Seidel) code for the solution of fluid dynamic equations was carried out in three different levels and tested for several different microprocessor architectures most widely used in these days. The test results of localized optimization showed a remarkable performance gain up to 7.35 times faster solution, depending on the system, than the baseline algorithm for producing exactly the same solution on the same computer system.
https://doi.org/10.5139/JKSAS.2002.30.7.068 인용 PDF KSCI

Design and Implementation of a High-Performance Index Manager in a Main Memory DBMS (주기억장치 DBMS를 위한 고성능 인덱스 관리자의 설계 및 구현)

Kim, Sang-Wook;Lee, Kyung-Tae;Choi, Wan
- The Journal of Korean Institute of Communications and Information Sciences
- /
- v.28 no.7B
- /
- pp.605-619
- /
- 2003
The main memory DBMS(MMDBMS) efficiently supports various database applications that require high performance since it employs main memory rather than disk as a primary storage. In this paper, we discuss the index manager of the Tachyon, a next-generation MMDBMS. Recently, the gap between the CPU processing and main memory access times is becoming much wider due to rapid advance of CPU technology. By devising data structures and algorithms that utilize the behavior of the cache in CPU, we are able to enhance the overall performance of MMDBMSs considerably. In this paper, we address the practical implementation issues and our solutions for them obtained in developing the cache-conscious index manager of the Tachyon. The main issues touched are (1) consideration of the cache behavior, (2) compact representation of the index entry and the index node, (3) support of variable-length keys, (4) support of multiple-attribute keys, (5) support of duplicated keys, (6) definition of the system catalog for indexes, (7) definition of external APIs, (8) concurrency control, and (9) backup and recovery. We also show the effectiveness of our approach through extensive experiments.
PDF KSCI

MMT, 차세대 방송 및 인터넷 멀티미디어 전송 서비스를 위한 새로운 대안

Im, Yeong-Gwon
- Information and Communications Magazine
- /
- v.30 no.5
- /
- pp.11-17
- /
- 2013
지난 수 십년간 MPEG은 MPEG-2 TS나 ISO 파일 포맷과 같은 멀티미디어 전송을 위한 기술 표준을 성공적으로 개발하여 왔다. 인터넷을 통한 멀티미디어 서비스의 폭발적인 증가에 따른 멀티미디어 서비스 환경의 변화는 멀티미디어 전송 표준에 대한 새로운 요구 사항을 가져왔다. 이러한 요구 사항의 주요 예로는 멀티미디어 구성 요소에 대한 유연하고 동적인 접근 및 저장용 포맷과 패킷화 전송용 포맷 간의 용이한 변환, 그리고 캐쉬나 단말내 저장 장치를 포함하는 다양한 공급자로부터의 멀티미디어 구성 요소의 복합적인 활용을 들 수 있다. MPEG은 이러한 요구 사항에 대응하기 위해서 MPEG Media Transport(MMT) 표준을 개발하고 있따. 본고에서는 새로운 요구 사항과 관련된 MPEG-2 TS나 RTP와 같은 기존 멀티미디어 전송 기술의 도전과 한계를 살펴보고 MMT가 이러한 도전과 한계를 극복하고 있는지에 대해 간략히 소개하고자 한다.
PDF KSCI

A Study of performance improvement of Mobile IP for handoff in IPv6 environment (IPv6 환경에서 Mobile IP handoff 성능향상 관한 연구)

Yang, Hee-Won;Yoe, Hyun
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2002.11a
- /
- pp.275-278
- /
- 2002
이동성을 제공하기 위해 IETF에서는 Mobile IP 라는 프로토콜을 제안하였다. IETF에서는 주소고갈 문제를 해결하기 위해 앞으로도 계속적인 수요를 충족시킬 만큼의 충분한 주소를 제공할 수 있는 IPv6의 차세대 인터넷 프로토콜을 채택하였다. 본 논문에서는 Mobile IP에서 핸드오프가 발생할 때 패킷의 손실을 최소화하는 과정을 알아본다. 또한 Mobile IP기능을 IPv6에서 재설계함으로써 이동 노드가 접속점을 변경시킬 때 핸드오버가 발생하는데 이때, 성능 향상을 위한 바인딩 캐쉬 서버 도입과 지역등록 방안을 제안한다. 제한 사항에 대한 시뮬레이션을 하기 위해 Linux Machine을 이용한 Network Simulator(version-2)를 사용 하였다.
PDF

Analysis on the Performance Impact of Partitioned LLC for Heterogeneous Multicore Processors (이종 멀티코어 프로세서에서 분할된 공유 LLC가 성능에 미치는 영향 분석)

Moon, Min Goo;Kim, Cheol Hong
- The Journal of Korean Institute of Next Generation Computing
- /
- v.15 no.2
- /
- pp.39-49
- /
- 2019
Recently, CPU-GPU integrated heterogeneous multicore processors have been widely used for improving the performance of computing systems. Heterogeneous multicore processors integrate CPUs and GPUs on a single chip where CPUs and GPUs share the LLC(Last Level Cache). This causes a serious cache contention problem inside the processor, resulting in significant performance degradation. In this paper, we propose the partitioned LLC architecture to solve the cache contention problem in heterogeneous multicore processors. We analyze the performance impact varying the LLC size of CPUs and GPUs, respectively. According to our simulation results, the bigger the LLC size of the CPU, the CPU performance improves by up to 21%. However, the GPU shows negligible performance difference when the assigned LLC size increases. In other words, the GPU is less likely to lose the performance when the LLC size decreases. Because the performance degradation due to the LLC size reduction in GPU is much smaller than the performance improvement due to the increase of the LLC size of the CPU, the overall performance of heterogeneous multicore processors is expected to be improved by applying partitioned LLC to CPUs and GPUs. In addition, if we develop a memory management technique that can maximize the performance of each core in the future, we can greatly improve the performance of heterogeneous multicore processors.

Design and Implementation of Mobile IP for Handover Performance Improvement (핸드오버 성능향상을 위한 Mobile IP의 설계 및 구현)

박석천;정선화;이정준;정운영
- Journal of Internet Computing and Services
- /
- v.3 no.2
- /
- pp.27-37
- /
- 2002
This paper analyzes problems of existing Mobile IP and redesigns component functions, and proposes handover method using LRS and BCS for handover performance improvement. When mobile node moves another domain, the proposed BCS recognizes its mobility. So, BCS manages mobile node's binding information and serves micro mobility. When handover occurs among domains, micro mobility enables using LRS, As BCS and LRS have buffering function, they can reduce packet loss by forwarding the buffered datagram. It's designed component is implemented in LINUX environment. And then, handover performance was evaluated by simulation. The results of proposed handover method are better than existing method in both transmission delay time and packet loss.
PDF

Implementation of RTOS Simulator With Execution Time Estimation (실행시간 추정 가능한 RTOS 시뮬레이터의 구현)

김방현;류성준;김종현;남영광;이광용
- Proceedings of the Korea Society for Simulation Conference
- /
- 2002.05a
- /
- pp.125-129
- /
- 2002
실시간 운영체제(Real-Time Operating System: 이하 RTOS라 함) 개발환경에서 제공하는 도구 중에 하나인 RTOS 시뮬레이터는 타겟 하드웨어가 호스트에 연결되어 있지 않아도 호스트에서 응용프로그램의 개발과 디버깅을 가능하게 해주는 타겟 시뮬레이션 환경을 제공해 줌으로서, 개발자로 하여금 빠른 시간 내에 응용프로그램을 개발할 수 있도록 지원하며 하드웨어 개발이 완료되기 전에도 응용프로그램을 개발할 수 있게 해 준다. 그러한 이유로 현재 대부분의 상용 RTOS 개발환경에서는 RTOS 시뮬레이터를 제공하고 있다. 그러나 현재 상용 RTOS 시뮬레이터들은 대부분 RTOS의 기능적인 부분들만 호스트에서 동작하도록 구현되어 있어서 RTOS나 RTOS 응용프로그램이 실제 타겟에서 실행될 때의 실질적인 시간 추정이 불가능하다. 이러한 문제점은 실시간 시스템이 정해진 시간 내에 결과를 출력해야 하는 시스템임을 감안한다면 RTOS 시뮬레이터의 가장 큰 결점이 되기 때문에 실행시간 추정 기능을 가지면서 실용화도 가능한 RTOS 시뮬레이터가 필요하다. 본 연구에서는 이러한 문제점을 해결하여 RTOS와 RTOS 응용프로그램이 실제 타겟에서 처리될 때의 실행시간 추정이 가능하고 상용화가 가능한 기계 명령어 기반(machine instruction-based)의 RTOS 시뮬레이터를 연구 개발하였다. 나아가 실행시간의 주요 요소인 파이프라인과 캐쉬의 영향도 고려함으로서 실행시간 추정의 정확도를 향상시켰다 본 연구에서 사용된 RTOS는 한국전자통신연구원(ETRI)에서 2000년에 개발된 Q+이고, Q+가 동작하는 타겟 하드웨어는 ARM 계열의 StrongARM SA-110 마이크로프로세서와 21285 주제어기가 장착된 EBSA-285 보드이다. 측정하면서 수행하였다. 검증 결과 random 상태에서는 문헌자료에 부합되는 예측결과를 보여주었으나, intermediate와 constant 상태에서는 문헌보다 다소 낮은 속도를 보여주었다 이러한 속도차는 추후 현장 데이터를 수집하여 보다 실질적인 검증을 통하여 조정되어야 할 것으로 판단된다.지발광(1.26초)보다 구애발광(1.12초)에서 0.88배 감소하였고, 암컷에서 정지발광(2.99초)보다 구애발광(1.06초)에서 0.35배 감소하였다. 발광양상에서 발광주파수는 수짓의 정지발광에서 0.8 Hz, 수컷 구애발광에서 0.9 Hz, 암컷의 정지발광에서 0.3 Hz, 암컷의 구애발광에서 0.9 Hz로 각각 나타났다. H. papariensis의 발광파장영역은 400 nm에서 700 nm에 이르는 모든 영역에서 확인되었으며 가장 높은 첨두치는 600 nm에 있고 500에서 600 nm 사이의 파장대가 가장 두드러지게 나타났다. 발광양상과 어우러진 교미행동은 Hp system과 같은 결과를 얻었다.하는 방법을 제안한다. 즉 채널 액세스 확률을 각 슬롯에서 예약상태에 있는 음성 단말의 수뿐만 아니라 각 슬롯에서 예약을 하려고 하는 단말의 수에 기초하여 산출하는 방법을 제안하고 이의 성능을 분석하였다. 시뮬레이션에 의해 새로 제안된 채널 허용 확률을 산출하는 방식의 성능을 비교한 결과 기존에 제안된 방법들보다 상당한 성능의 향상을 볼 수 있었다., 인삼이 성장될 때 부분적인 영양상태의 불충분이나 기후 등에 따른 영향을 받을 수 있기 때문에 앞으로 이에 대한 많은 연구가 이루어져야할 것으로 판단된다.태에도 불구하고 [-wh]의미의 겹의문사는 병렬적 관계의 합성어가 아니라 내부구조를 지니지 않은 단순한 단어(minimal $X^{0}$
PDF

Search Result 19, Processing Time 0.022 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)