Search | Korea Science

Adaptive Mapping Information Management Scheme for High Performance Large Sale Flash Memory Storages (고성능 대용량 플래시 메모리 저장장치의 효과적인 매핑정보 캐싱을 위한 적응적 매핑정보 관리기법)

Lee, Yongju;Kim, Hyunwoo;Kim, Huijeong;Huh, Taeyeong;Jung, Sanghyuk;Song, Yong Ho
- Journal of the Institute of Electronics and Information Engineers
- /
- v.50 no.3
- /
- pp.78-87
- /
- 2013
NAND flash memory has been widely used as a storage medium in mobile devices, PCs, and workstations due to its advantages such as low power consumption, high performance, and random accessability compared to a hard disk drive. However, NAND flash cannot support in-place update so that it is mandatory to erase the entire block before overwriting the corresponding page. In order to overcome this drawback, flash storages need a software support, named Flash Translation Layer. However, as the high performance mass NAND flash memory is getting widely used, the size of mapping tables is increasing more than the limited DRAM size. In this paper, we propose an adaptive mapping information caching algorithm based on page mapping to solve this DRAM space shortage problem. Our algorithm uses a mapping information caching scheme which minimize the flash memory access frequency based on the analysis of several workloads. The experimental results show that the proposed algorithm can increase the performance by up to 70% comparing with the previous mapping information caching algorithm.
https://doi.org/10.5573/ieek.2013.50.3.078 인용 PDF KSCI

Efficient Indirect Branch Predictor Based on Data Dependence (효율적인 데이터 종속 기반의 간접 분기 예측기)

Paik Kyoung-Ho;Kim Eun-Sung
- Journal of the Institute of Electronics Engineers of Korea CI
- /
- v.43 no.4 s.310
- /
- pp.1-14
- /
- 2006
The indirect branch instruction is a most substantial obstacle in utilizing ILP of modem high performance processors. The target address of an indirect branch has the polymorphic characteristic varied dynamically, so it is very difficult to predict the accurate target address. Therefore the performance of a processor with speculative methodology is reduced significantly due to the many execution cycle delays in occurring the misprediction. We proposed the very accurate and novel indirect branch prediction scheme so called data-dependence based prediction. The predictor results in the prediction accuracy of 98.92% using 1K entries, and. 99.95% using 8K But, all of the proposed indirect predictor including our predictor has a large hardware overhead for restoring expected target addresses as well as tags for alleviating an aliasing. Hence, we propose the scheme minimizing the hardware overhead without sacrificing the prediction accuracy. Our experiment results show that the hardware is reduced about 60% without the performance loss, and about 80% sacrificing only the performance loss of 0.1% in aspect of the tag overhead. Also, in aspect of the overhead of storing target addresses, it can save the hardware about 35% without the performance loss, and about 45% sacrificing only the performance loss of 1.11%.
PDF KSCI

A Study of High-Quality Factor Solenoid-Type RF Chip Inductor Utilizing Amorphous $Al_2O_3$ Core Material (비정질 $Al_2O_3$ 코아 재료를 이용한 Solenoid 형태의 고품질 RF chip 인덕터에 관한 연구)

Lee, Jae-Wook;Jung, Young-Chang;Yun, Eui-Jung;Hong, Chol-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.37 no.6
- /
- pp.34-42
- /
- 2000
Recently, there is a growing need to develope small-size RF chip inductors operating to GHz to realize high-performance, micro-fabricated wireless communication products. For the development of high-performance RF chip inductors, however, the ferrite-based chip inductors can not be used above 300MHz due to the limitation of the permeability of this material. In this work, small-size, high-performance RF chip inductors utilizing amorphous $Al_2O_3$ core material were investigated. Copper (Cu) with 40${\mu}m$ diameter was used as the coils and the chip inductor size fabricated in this work is $2.1mm{\times}1.5mm{\times}1.0mm$. The external current source was applied after bonding Cu coil leads to gold pads electro-plated on the bottom edges of a core material. The composition of core materials was measured using a EDX. High frequency characteristics of the inductance (L), quality factor (Q), and impedance (Z) of developed inductors were measured using an RF Impedance/Material Analyzer (HP4291B with HP16193A test fixture). The developed inductors have the self-resonant frequency (SRF) of 1 to 3.5 GHz and exhibit L of 22 to 150 nH. The L of the inductors decreases with increasing the SRF. The Z of the inductors has the maximum value at the SRF and the inductors have the quality factor of 70 to 97 in the frequency range of 500 MHz to 1.5 GHz.
PDF

A Pipelined Parallel Optimized Design for Convolution-based Non-Cascaded Architecture of JPEG2000 DWT (JPEG2000 이산웨이블릿변환의 컨볼루션기반 non-cascaded 아키텍처를 위한 pipelined parallel 최적화 설계)

Lee, Seung-Kwon;Kong, Jin-Hyeung
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.46 no.7
- /
- pp.29-38
- /
- 2009
In this paper, a high performance pipelined computing design of parallel multiplier-temporal buffer-parallel accumulator is present for the convolution-based non-cascaded architecture aiming at the real time Discrete Wavelet Transform(DWT) processing. The convolved multiplication of DWT would be reduced upto 1/4 by utilizing the filter coefficients symmetry and the up/down sampling; and it could be dealt with 3-5 times faster computation by LUT-based DA multiplication of multiple filter coefficients parallelized for product terms with an image data. Further, the reutilization of computed product terms could be achieved by storing in the temporal buffer, which yields the saving of computation as well as dynamic power by 50%. The convolved product terms of image data and filter coefficients are realigned and stored in the temporal buffer for the accumulated addition. Then, the buffer management of parallel aligned storage is carried out for the high speed sequential retrieval of parallel accumulations. The convolved computation is pipelined with parallel multiplier-temporal buffer-parallel accumulation in which the parallelization of temporal buffer and accumulator is optimize, with respect to the performance of parallel DA multiplier, to improve the pipelining performance. The proposed architecture is back-end designed with 0.18um library, which verifies the 30fps throughput of SVGA(800$\times$600) images at 90MHz.
PDF KSCI

An Improvement of Implementation Method for Multi-Layer AHB BusMatrix (ML-AHB 버스 매트릭스 구현 방법의 개선)

Hwang Soo-Yun;Jhang Kyoung-Sun
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.629-638
- /
- 2005
In the System on a Chip design, the on chip bus is one of the critical factors that decides the overall system performance. Especially, in the case or reusing the IPs such as processors, DSPs and multimedia IPs that requires higher bandwidth, the bandwidth problems of on chip bus are getting more serious. Recently ARM proposes the Multi-Layer AHB BusMatrix that is a highly efficient on chip bus to solve the bandwidth problems. The Multi-Layer AHB BusMatrix allows parallel access paths between multiple masters and slaves in a system. This is achieved by using a more complex interconnection matrix and gives the benefit of increased overall bus bandwidth, and a more flexible system architecture. However, there is one clock cycle delay for each master in existing Multi-Layer AHB BusMatrix whenever the master starts new transactions or changes the slave layers because of the Input Stage and arbitration logic realized with Moore type. In this paper, we improved the existing Multi-Layer AHB BusMatrix architecture to solve the one clock cycle delay problems and to reduce the area overhead of the Input Stage. With the elimination of the Input Stage and some restrictions on the arbitration scheme, we tan take away the one clock cycle delay and reduce the area overhead. Experimental results show that the end time of total bus transaction and the average latency time of improved Multi-Layer AHB BusMatrix are improved by $20\%\;and\;24\%$ respectively. in ease of executing a number of transactions by 4-beat incrementing burst type. Besides the total area and the clock period are reduced by $22\%\;and\;29\%$ respectively, compared with existing Multi-layer AHB BusMatrix.
PDF KSCI

An Energy-Delay Efficient System with Adaptive Victim Caches (선택적 희생 캐쉬를 이용한 저전력 고성능 시스템 설계 방안)

Kim Cheol Hong;Shim Sunghoon;Jhon Chu Shik;Jhang Seong Tae
- Journal of KIISE:Computer Systems and Theory
- /
- v.32 no.11_12
- /
- pp.663-674
- /
- 2005
We propose a system aimed at achieving high energy-delay efficiency by using adaptive victim caches. Particularly, we investigate methods to improve the hit rates in the first level of memory hierarchy, which reduces the number of accesses to mort power consuming memory structures such as L2 cache. Victim cache is a memory element for reducing conflict misses in a direct-mapped L1 cache. We present two techniques to fill the victim cache with the blocks that have higher probability to be re-reqeusted by processor. Hit-based victim cache ks tilled with the blocks which were referenced frequently by processor. Replacement-based victim cache is filled with the blocks which were evicted from the sets where block replacements had happened frequently According to our simulations, replacement-based victim cache scheme outperforms the conventional victim cache scheme about $2\%$ on average and refutes the power consumption by up to $8\%$.
PDF KSCI

Color Media Instructions for Embedded Parallel Processors (임베디드 병렬 프로세서를 위한 칼라미디어 명령어 구현)

Kim, Cheol-Hong;Kim, Jong-Myon
- Journal of KIISE:Computer Systems and Theory
- /
- v.35 no.7
- /
- pp.305-317
- /
- 2008
As a mobile computing environment is rapidly changing, increasing user demand for multimedia-over-wireless capabilities on embedded processors places constraints on performance, power, and sire. In this regard, this paper proposes color media instructions (CMI) for single instruction, multiple data (SIMD) parallel processors to meet the computational requirements and cost goals. While existing multimedia extensions store and process 48-bit pixels in a 32-bit register, CMI, which considers that color components are perceptually less significant, supports parallel operations on two-packed compressed 16-bit YCbCr (6 bit Y and 5 bits Cb, Cr) data in a 32-bit datapath processor. This provides greater concurrency and efficiency for YCbCr data processing. Moreover, the ability to reduce data format size reduces system cost. The reduction in data bandwidth also simplifies system design. Experimental results on a representative SIMD parallel processor architecture show that CMI achieves an average speedup of 6.3x over the baseline SIMD parallel processor performance. This is in contrast to MMX (a representative Intel's multimedia extensions), which achieves an average speedup of only 3.7x over the same baseline SIMD architecture. CMI also outperforms MMX in both area efficiency (a 52% increase versus a 13% increase) and energy efficiency (a 50% increase versus an 11% increase). CMI improves the performance and efficiency with a mere 3% increase in the system area and a 5% increase in the system power, while MMX requires a 14% increase in the system area and a 16% increase in the system power.
PDF KSCI

Separation and Determination of Co(II) and Ni(II) Ion as their 4-(2-Pyridylazo) resorcinol Chelates by Reversed-Phase Capillary High-Performance Liquid Chromatography (역상 모세관-고성능 액체 크로마토그래피에 의한 코발트와 니켈 이온의 4-(2-피리딜아조)레조루신올 킬레이트로서의 분리 및 정량)

Chung, Yong-Soon;Chung, Won-Seog
- Journal of the Korean Chemical Society
- /
- v.47 no.6
- /
- pp.547-552
- /
- 2003
Separation and determinations of Co(II) and Ni(II) ions as their 4-(2-pyridylazo)resorcinol(PAR) chelates by reversed-phase capillary high-performance liquid chromatography(RP-CpHPLC) were performed. Among many capillary columns, Vydac C4 column was selected and acetonitrile solution was used as mobile phase. The effect of pH and MeCN concentration(%) on the retention factor, k and peak intensity was examined and discussed. As a results, it was found that 22.5% MeCN and pH 5.60 was adequate as mobile phase for the separation of the two metal ions and determination of Co(II) ion, but the mobile phase condition for Ni(II) ion determination was 22.5% MeCN of pH 7.20. Detection limit(D.L., S/N=3) of Co(II) and Ni(II) ions were $2.0{\times}10{-7}$ M(14.9 ppb) and $1.0{\times}10{-6}$ M(59.2 ppb), respectively.
https://doi.org/10.5012/jkcs.2003.47.6.547 인용 PDF KSCI KPUBS HTML

Design of a Pipelined Deblocking Filter with efficient memory management for high performance H.264 decoders (효율적인 메모리 관리 구조를 갖는 H.264용 고성능 디블록킹 필터 설계)

Yu, Yong-Hoon;Lee, Chan-Ho
- Journal of the Institute of Electronics Engineers of Korea SD
- /
- v.45 no.1
- /
- pp.64-70
- /
- 2008
The H.264 standard is widely used due to the high compression rate and quality. The deblocking filter of the H.264 standard improves the quality of images by eliminating blocking artifacts of pictures, and it requires a lot of computation. We propose a new hardware architecture for the deblocking filter with pipelined architecture, 1-D filters which support both horizontal and vertical filtering and efficient memory management. Four memory blocks are configured for the efficient storage and access of the current macroblock and adjacent referenced sub-macroblocks, and the pixel data from the motion compensation unit can be transferred without waiting during the computation cycles of the deblocking filter. The number of computation cycles and the hardware area are reduced using the proposed architecture, and the performance of the H.264 decoder is improved. We design the deblocking filter using Verilog-HDL and implement using an FPGA. The designed deblocking filter can be used for decoding HD quality images at 77 MHz.
PDF KSCI

HVIA-GE: A Hardware Implementation of Virtual Interface Architecture Based On Gigabit Ethernet (HVIA-GE: 기가비트 이더넷에 기반한 Virtual Interface Architecture의 하드웨어 구현)

박세진;정상화;윤인수
- Journal of KIISE:Computer Systems and Theory
- /
- v.31 no.5_6
- /
- pp.371-378
- /
- 2004
This paper presents the implementation and performance of the HVIA-GE card, which is a hardware implementation of the Virtual Interface Architecture (VIA) based on Gigabit Ethernet. The HVIA-GE card is a 32-bit/33MHz PCI adapter containing an FPGA for the VIA protocol engine and a Gigabit Ethernet chip set to construct a high performance physical network. HVIA-GE performs virtual-to-physical address translation, Doorbell, and send/receive completion operations in hardware without kernel intervention. In particular, the Address Translation Table (ATT) is stored on the local memory of the HVIA-GE card, and the VIA protocol engine efficiently controls the address translation process by directly accessing the ATT. As a result, the communication overhead during send/receive transactions is greatly reduced. Our experimental results show the maximum bandwidth of 93.7MB/s and the minimum latency of 11.9${\mu}\textrm{s}$. In terms of minimum latency HVIA-GE performs 4.8 times and 9.9 times faster than M-VIA and TCP/IP, respectively, over Gigabit Ethernet. In addition, the maximum bandwidth of HVIA-GE is 50.4% and 65% higher than M-VIA and TCP/IP respectively.
PDF KSCI

Search Result 431, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)