Search | Korea Science

김태균
- Proceedings of the Korean Information Science Society Conference
- /
- 2000.10c
- /
- pp.609-611
- /
- 2000
CC-NUMA 시스템은 SMP 시스템의 장점인 프로그래밍의 편리함, 작업 환경의 유연함 및 관리의 용이함 등을 유지하는 한편, SMP의 단점이었던 확장성까지 제공한다. 더욱이 메모리 장벽 즉 급격히 빨라지는 프로세서의 처리 속도에 비해 메모리의 속도는 거의 변화가 없음으로 인하여 야기되는 문제를 극복할 수 있는 구조적인 대안으로 각광받고 있다. 이러한 CC-NUMA 시스템은 노드간의 논리적인 거리가 길기 때문에 프로세싱 노드간의 통신이 시스템의 성능에 영향을 미치는 가장 핵심 요소가 된다. 따라서 노드간의 통신을 최소화 해주기 위한 노력으로 각 노드에 장착되어지는 원격 캐쉬의 중요성이 강조된다. 본 논문에서는 CC-NUMA 시스템에서는 노드간 데이터 통신의 유형을 파악하고, 원격 캐쉬의 블록 사이즈에 따른 이들의 발생횟수의 변화를 분석하였다. 인스트럭션 시뮬레이터인 CacheMire와 II 벤치마크 중 하나인 FFT를 이용하여 실행-구동 시뮬레이션을 통해 원격캐쉬 블록의 크기가 증가할수록 노드간 통신의 횟수는 물론 전송되는 데이터의 절대적인 양이 감소한다는 사실을 알 수 있었다.
PDF

Na-Hyun Kim;Jeong-Beom Kim
- The Journal of the Korea institute of electronic communication sciences
- /
- v.18 no.5
- /
- pp.777-784
- /
- 2023
A sense amplifier is an essential peripheral circuit for designing a memory and is used to sense a small differential input signal and amplify it into digital signal. In this paper, a high-speed sense amplifier applicable to in-memory computing circuits is proposed. The proposed circuit reduces sense delay time through transistor Mtail that provides an additional discharge path and improves the circuit performance of the sense amplifier by applying m-GDI (: modified Gate Diffusion Input). Compared with previous structure, the sense delay time was reduced by 16.82%, the PDP(: Power Delay Product) by 17.23%, the EDP(: Energy Delay Product) by 31.1%. The proposed circuit was implemented using TSMC's 65nm CMOS process, while its feasibility was verified through SPECTRE simulation in this study.
https://doi.org/10.13067/JKIECS.2023.18.5.777 인용 PDF

Jeonggeun Kim
- Journal of the Semiconductor & Display Technology
- /
- v.22 no.3
- /
- pp.142-148
- /
- 2023
In this paper, we identify performance issues in executing compute kernels from PolyBench, which includes compute kernels that are the core computational units of various data-intensive workloads, such as deep learning and data-intensive applications, on Processing-in-Memory (PIM) devices. Therefore, using our in-house simulator, we measured and compared the various performance metrics of workloads based on traditional out-of-order and in-order processors with Processing-in-Memory-based systems. As a result, the PIM-based system improves performance compared to other computing models due to the short-term data reuse characteristic of computational kernels from PolyBench. However, some kernels perform poorly in PIM-based systems without a multi-layer cache hierarchy due to some kernel's long-term data reuse characteristics. Hence, our evaluation and analysis results suggest that further research should consider dynamic and workload pattern adaptive approaches to overcome performance degradation from computational kernels with long-term data reuse characteristics and hidden data locality.
PDF

Jeonggeun Kim
- Journal of the Semiconductor & Display Technology
- /
- v.22 no.3
- /
- pp.78-83
- /
- 2023
This paper proposes a lightweight trace-driven Processing-In-Memory (PIM) simulator, TP-Sim. TP-Sim is a General Purpose PIM (GP-PIM) simulator that evaluates various PIM system performance-related metrics. Based on instruction and memory traces extracted from the Intel Pin tool, TP-Sim can replay trace files for multiple models of PIM architectures to compare its performance. To verify the availability of TP-Sim, we estimated three different system configurations on the STREAM benchmark. Compared to the traditional Host CPU-only systems with conventional memory hierarchy, simple GP-PIM architecture achieved better performance; even the Host CPU has the same number of in-order cores. For further study, we also extend TP-Sim as a part of a heterogeneous system simulator that contains CPU, GPGPU, and PIM as its primary and co-processors.
PDF