Search | Korea Science

Optimizing Shared Memory Accesses for GPGPU Computations (GPGPU를 위한 공유 메모리 최적화)

Tran, Nhat-Phuong;Lee, Myungho;Hong, Sugwon
- Proceedings of the Korea Information Processing Society Conference
- /
- 2012.11a
- /
- pp.197-199
- /
- 2012
Recently, a lot of general-purpose application programs in addition to graphic applications have been parallelized for boosting their performance using Graphic Processing Unit (GPU)'s excellent floating-point performance. In order to maximize the application performance on GPUs, optimizing the memory hierarchy and the on-chip caches such as the shared memory is essential. In this paper, we propose techniques to optimize the shared memory, and verify its effectiveness using a pattern matching application program.
https://doi.org/10.3745/PKIPS.y2012m11a.197 인용 PDF

Multi-operation-based Constrained Random Verification for On-Chip Memory

Son, Hyeonuk;Jang, Jaewon;Kim, Heetae;Kang, Sungho
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.15 no.3
- /
- pp.423-426
- /
- 2015
Current verification methods for on-chip memory have been implemented using coverpoints that are generated based on a single operation. These coverpoints cannot consider the influence of other memory banks in a busy state. In this paper, we propose a method in which the coverpoints account for all operations executed on different memory banks. In addition, a new constrained random vector generation method is proposed to reduce the required random vectors for the multi-operation-based coverpoints. The simulation results on NAND flash memory show 100% coverage with 496,541 constrained random vectors indicating a reduction of 96.4% compared with conventional random vectors.
https://doi.org/10.5573/JSTS.2015.15.3.423 인용 PDF KSCI

Wafer Burn-in Method of SRAM for Multi Chip Package

Kim, Hoo-Sung;Kim, Je-Yoon;Sung, Man-Young
- Transactions on Electrical and Electronic Materials
- /
- v.5 no.4
- /
- pp.138-142
- /
- 2004
This paper presents the improved bum-in method for the reliability of SRAM in Multi Chip Package (MCP). Semiconductor reliability is commonly improved through the bum-in process. Reliability problem is more significant in MCP that includes over two chips in a package, because the failure of one chip (SRAM) has a large influence on the yield and quality of the other chips - Flash Memory, DRAM, etc. Therefore, the quality of SRAM must be guaranteed. To improve the quality of SRAM, we applied the improved wafer level bum-in process using multi cells selection method in addition to the previously used methods. That method is effective in detecting special failure. Finally, with the composition of some kind of methods, we could achieve the high quality of SRAM in Multi Chip Package.
https://doi.org/10.4313/TEEM.2004.5.4.138 인용 PDF KSCI

The Design of DRAM Memory Modules in the Fabrication by the MCM-L Technique (DRAM 메모리 모듈 제작에서 MCM-L 구조에 의한 설계)

Jee, Yong;Park, Tae-Byung
- Journal of the Korean Institute of Telematics and Electronics A
- /
- v.32A no.5
- /
- pp.737-748
- /
- 1995
In this paper, we studyed the variables in the design of multichip memory modules with 4M$\times$1bit DRAM chips to construct high capacity and high speed memory modules. The configuration of the module was 8 bit, 16 bit, and 32 bit DRAM modules with employing 0.6 W, 70 nsec 4M$\times$1 bit DRAM chips. We optimized routing area and wiring density by performing the routing experiment with the variables of the chip allocation, module I/O terminal, the number of wiring, and the number of mounting side of the chips. The multichip module was designed to be able to accept MCM-L techiques and low cost PCB materials. The module routing experiment showed that it was an efficient way to align chip I/O terminals and module I/O terminals in parallel when mounting bare chips, and in perpendicular when mounting packaged chips, to set module I/O terminals in two sides, to use double sided substrates, and to allocate chips in a row. The efficient number of wiring layer was 4 layers when designing single sided bare chip mounting modules and 6 layers when constructing double sided bare chip mounting modules whereas the number of wiring layer was 3 layers when using single sided packaged chip mounting substrates and 5 layers when constructing double sided packaged chip mounting substrates. The most efficient configuration was to mount bare chips on doubled substrates and also to increase the number of mounting chips. The fabrication of memory multichip module showed that the modules with bare chips can be reduced to a half in volume and one third in weight comparing to the module with packaged chips. The signal propagation delay time on module substrate was reduced to 0.5-1 nsec.
PDF

Development of an Embedded Bluetooth Audio Streaming Solution on SoC Platform (SoC 플랫폼 상에서 임베디드 블루투스 오디오 스트리밍 솔루션 개발)

Kim, Tae-Hyoun
- The KIPS Transactions:PartA
- /
- v.13A no.7 s.104
- /
- pp.589-598
- /
- 2006
In this paper, we describe the development and optimization of an embedded Biuetooth solution on an SoC platform for real-time audio streaming over a Bluetooth wireless link. The solution includes embedded Bluetooth protocol stack and profile simplemented on a virtual operating system for portability, and other optimization techniques to fully exploit the benefits of multimedia-oriented SoC. The optimization techniques implemented in this paper are memory access minimization by using on-chip scratch pad memory, codec library optimization with DSP and parallel memory access instruction set, and dynamic audio quality adjustment regarding current wireless link status. Experimental results show that the optimized solution presented in this paper can support high-qualify audio streaming without the support of external memory.
https://doi.org/10.3745/KIPSTA.2006.13A.7.589 인용 PDF KSCI

Programmable Memory BIST for Embedded Memory (내장 메모리를 위한 프로그램 가능한 자체 테스트)

Hong, Won-Gi;Chang, Hoon
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.44 no.12
- /
- pp.61-70
- /
- 2007
The density of Memory has been increased by great challenge for memory technology. Therefore, elements of memory become more smaller than before and the sensitivity to faults increases. As a result of these changes, memory testing becomes more complex. In addition, as the number of storage elements per chip increases, the test cost becomes more remarkable as the cost per transistor drops. Recent development in system-on-chip (SOC) technology makes it possible to incorporate large embedded memories into a chip. However, it also complicates the test process, since usually the embedded memories cannot be controlled from the external environment. Proposed design doesn't need controls from outside environment, because it integrates into memory. In general, there are a variety of memory modules in SOC, and it is not possible to test all of them with a single algorithm. Thus, the proposed scheme supports the various memory testing process. Moreover, it is able to At-Speed test in a memory module. consequently, the proposed is more efficient in terms of test cost and test data to be applied.
PDF KSCI

A Concurrent Testing of DRAMs Utilizing On-Chip Networks (온칩네트워크를 활용한 DRAM 동시 테스트 기법)

Lee, Changjin;Nam, Jonghyun;Ahn, Jin-Ho
- Journal of the Semiconductor & Display Technology
- /
- v.19 no.2
- /
- pp.82-87
- /
- 2020
In this paper, we introduce the novel idea to improve the B/W usage efficiency of on-chip networks used for TAM to test multiple DRAMs. In order to avoid the local bottleneck of test packets caused by an ATE, we make test patterns using microcode-based instructions within ATE and adopt a test bus to transmit test responses from DRAM DFT (Design for Testability) called Test Generator (TG) to ATE. The proposed test platform will contribute to increasing the test economics of memory IC industry.
PDF KSCI

Implementation of 16Kpbs ADPCM by DSK50 (DSK50을 이용한 16kbps ADPCM 구현)

Cho, Yun-Seok;Han, Kyong-Ho
- Proceedings of the KIEE Conference
- /
- 1996.07b
- /
- pp.1295-1297
- /
- 1996
CCITT G.721, G.723 standard ADPCM algorithm is implemented by using TI's fixed point DSP start kit (DSK). ADPCM can be implemented on a various rates, such as 16K, 24K, 32K and 40K. The ADPCM is sample based compression technique and its complexity is not so high as the other speech compression techniques such as CELP, VSELP and GSM, etc. ADPCM is widely applicable to most of the low cost speech compression application and they are tapeless answering machine, simultaneous voice and fax modem, digital phone, etc. TMS320C50 DSP is a low cost fixed point DSP chip and C50 DSK system has an AIC (analog interface chip) which operates as a single chip A/D and D/A converter with 14 bit resolution, C50 DSP chip with on-chip memory of 10K and RS232C interface module. ADPCM C code is compiled by TI C50 C-compiler and implemented on the DSK on-chip memory. Speech signal input is converted into 14 bit linear PCM data and encoded into ADPCM data and the data is sent to PC through RS232C. The ADPCM data on PC is received by the DSK through RS232C and then decoded to generate the 14 bit linear PCM data and converted into the speech signal. The DSK system has audio in/out jack and we can input and out the speech signal.
PDF

Scratchpad-Memory Management Using NUMA Infrastructure on Linux (Linux 상에서 NUMA 지원을 응용한 스크래치 패드 메모리 관리방법)

Park, Byung-Hun;Seo, Dae-Wha
- Proceedings of the Korea Information Processing Society Conference
- /
- 2009.11a
- /
- pp.41-42
- /
- 2009
현재 많은 임베디드 SoC(System-On-Chip)에는 캐시 메모리의 단점을 보완하기 위해 온-칩(On-Chip) SRAM, 즉, SPM(Scratchpad Memory)를 내장하고 있으며 SPM은 그 특성상 캐시 메모리와 달리 소프트웨어가 직접 관리해야 한다. 본 논문에서는 NUMA를 지원하는 Linux 상에서 이식성이 높으면서 단순하게 구현할 수 있는 SPM 관리 방법을 제안한다.
https://doi.org/10.3745/PKIPS.y2009m11a.41 인용 PDF

Design of a Dingle-chip Multiprocessor with On-chip Learning for Large Scale Neural Network Simulation (대규모 신경망 시뮬레이션을 위한 칩상 학습가능한 단일칩 다중 프로세서의 구현)

김종문;송윤선;김명원
- Journal of the Korean Institute of Telematics and Electronics B
- /
- v.33B no.2
- /
- pp.149-158
- /
- 1996
In this paper we describe designing and implementing a digital neural chip and a parallel neural machine for simulating large scale neural netsorks. The chip is a single-chip multiprocessor which has four digiral neural processors (DNP-II) of the same architecture. Each DNP-II has program memory and data memory, and the chip operates in MIMD (multi-instruction, multi-data) parallel processor. The DNP-II has the instruction set tailored to neural computation. Which can be sed to effectively simulate various neural network models including on-chip learning. The DNP-II facilitates four-way data-driven communication supporting the extensibility of parallel systems. The parallel neural machine consists of a host computer, processor boards, a buffer board and an interface board. Each processor board consists of 8*8 array of DNP-II(equivalently 2*2 neural chips). Each processor board acn be built including linear array, 2-D mesh and 2-D torus. This flexibility supports efficiency of mapping from neural network models into parallel strucgure. The neural system accomplishes the performance of maximum 40 GCPS(giga connection per second) with 16 processor boards.
PDF

Search Result 296, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)