• Title/Summary/Keyword: memory bandwidth reduction

Search Result 22, Processing Time 0.023 seconds

An Energy-Efficient Matching Accelerator Using Matching Prediction for Mobile Object Recognition

  • Choi, Seongrim;Lee, Hwanyong;Nam, Byeong-Gyu
    • JSTS:Journal of Semiconductor Technology and Science
    • /
    • v.16 no.2
    • /
    • pp.251-254
    • /
    • 2016
  • An energy-efficient object matching accelerator is proposed for mobile object recognition based on matching prediction scheme. Conventionally, vocabulary tree has been used to save the external memory bandwidth in object matching process but involved massive internal memory transactions to examine each object in a database. In this paper, a novel object matching accelerator is proposed based on matching predictions to reduce unnecessary internal memory transactions by mitigating non-target object examinations, thereby improving the energy-efficiency. Experimental results show a 26% reduction in power-delay product compared to the prior art.

Bringing 3D ICs to Aerospace: Needs for Design Tools and Methodologies

  • Lim, Sung Kyu
    • Journal of information and communication convergence engineering
    • /
    • v.15 no.2
    • /
    • pp.117-122
    • /
    • 2017
  • Three-dimensional integrated circuits (3D ICs), starting with memory cubes, have entered the mainstream recently. The benefits many predicted in the past are indeed delivered, including higher memory bandwidth, smaller form factor, and lower energy. However, 3D ICs have yet to find their deployment in aerospace applications. In this paper we first present key design tools and methodologies for high performance, low power, and reliable 3D ICs that mainly target terrestrial applications. Next, we discuss research needs to extend their capabilities to ensure reliable operations under the harsh space environments. We first present a design methodology that performs fine-grained partitioning of functional modules in 3D ICs for power reduction. Next, we discuss our multi-physics reliability analysis tool that identifies thermal and mechanical reliability trouble spots in the given 3D IC layouts. Our tools will help aerospace electronics designers to improve the reliability of these 3D IC components while not degrading their energy benefits.

Embedded Compression Codec Algorithm for Motion Compensated Wavelet Video Coding System (움직임 보상된 웨이블릿 기반의 비디오 코딩 시스템에 적용 가능한 임베디드 압축 코덱 알고리즘)

  • Kim, Song-Ju
    • The Journal of the Korea Contents Association
    • /
    • v.12 no.3
    • /
    • pp.77-83
    • /
    • 2012
  • In this paper, a low-complexity embedded compression (EC) Codec algorithm for the wavelet video coder is applied to reduce excessive external memory requirements. The EC algorithm is used to achieve a fixed compression ratio of 50 % under the near-lossless-compression constraint. The EC technique can reduce the 50 % memory requirement for intermediate low-frequency coefficients during multiple discrete wavelet transform stages compared with direct implementation of the wavelet video encoder of this paper. Furthermore, the EC scheme based on a forward adaptive quantization and fixed length coding can save bandwidth and size of buffer between DWT and SPIHT to 50 %. Simulation results show that our EC algorithm present only PSNR degradation of 0.179 and 0.162 dB in average when the target bit-rate of the video coder are 1 and 0.5 bpp, respectively.

Overview of High Performance 3D-WLP

  • Kim, Eun-Kyung
    • Korean Journal of Materials Research
    • /
    • v.17 no.7
    • /
    • pp.347-351
    • /
    • 2007
  • Vertical interconnect technology called 3D stacking has been a major focus of the next generation of IC industries. 3D stacked devices in the vertical dimension give several important advantages over conventional two-dimensional scaling. The most eminent advantage is its performance improvement. Vertical device stacking enhances a performance such as inter-die bandwidth improvements, RC delay mitigation and geometrical routing and placement advantages. At present memory stacking options are of great interest to many industries and research institutes. However, these options are more focused on a form factor reduction rather than the high performance improvements. In order to improve a stacked device performance significantly vertical interconnect technology with wafer level stacking needs to be much more progressed with reduction in inter-wafer pitch and increases in the number of stacked layers. Even though 3D wafer level stacking technology offers many opportunities both in the short term and long term, the full performance benefits of 3D wafer level stacking require technological developments beyond simply the wafer stacking technology itself.

CNN based Image Restoration Method for the Reduction of Compression Artifacts (압축 왜곡 감소를 위한 CNN 기반 이미지 화질개선 알고리즘)

  • Lee, Yooho;Jun, Dongsan
    • Journal of Korea Multimedia Society
    • /
    • v.25 no.5
    • /
    • pp.676-684
    • /
    • 2022
  • As realistic media are widespread in various image processing areas, image or video compression is one of the key technologies to enable real-time applications with limited network bandwidth. Generally, image or video compression cause the unnecessary compression artifacts, such as blocking artifacts and ringing effects. In this study, we propose a Deep Residual Channel-attention Network, so called DRCAN, which consists of an input layer, a feature extractor and an output layer. Experimental results showed that the proposed DRCAN can reduced the total memory size and the inference time by as low as 47% and 59%, respectively. In addition, DRCAN can achieve a better peak signal-to-noise ratio and structural similarity index measure for compressed images compared to the previous methods.

A PCA-based Data Stream Reduction Scheme for Sensor Networks (센서 네트워크를 위한 PCA 기반의 데이터 스트림 감소 기법)

  • Fedoseev, Alexander;Choi, Young-Hwan;Hwang, Een-Jun
    • Journal of Internet Computing and Services
    • /
    • v.10 no.4
    • /
    • pp.35-44
    • /
    • 2009
  • The emerging notion of data stream has brought many new challenges to the research communities as a consequence of its conceptual difference with conventional concepts of just data. One typical example is data stream processing in sensor networks. The range of data processing considerations in a sensor network is very wide, from physical resource restrictions such as bandwidth, energy, and memory to the peculiarities of query processing including continuous and specific types of queries. In this paper, as one of the physical constraints in data stream processing, we consider the problem of limited memory and propose a new scheme for data stream reduction based on the Principal Component Analysis (PCA) technique. PCA can transform a number of (possibly) correlated variables into a (smaller) number of uncorrelated variables. We adapt PCA for the data stream of a sensor network assuming the cooperation of a query engine (or application) with a network base station. Our method exploits the spatio-temporal correlation among multiple measurements from different sensors. Finally, we present a new framework for data processing and describe a number of experiments under this framework. We compare our scheme with the wavelet transform and observe the effect of time stamps on the compression ratio. We report on some of the results.

  • PDF

A New Network Bandwidth Reduction Method of Distributed Rendering System for Scalable Display (확장형 디스플레이를 위한 분산 렌더링 시스템의 네트워크 대역폭 감소 기법)

  • Park, Woo-Chan;Lee, Won-Jong;Kim, Hyung-Rae;Kim, Jung-Woo;Han, Tack-Don;Yang, Sung-Bong
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.10
    • /
    • pp.582-588
    • /
    • 2002
  • Scalable displays generate large and high resolution images and provide an immersive environment. Recently, scalable displays are built on the networked clusters of PCs, each of which has a fast graphics accelerator, memory, CPU, and storage. However, the distributed rendering on clusters is a network bound work because of limited network bandwidth. In this paper, we present a new algorithm for reducing the network bandwidth and implement it with a conventional distributed rendering system. This paper describes the algorithm called geometry tracking that avoids the redundant geometry transmission by indexing geometry data. The experimental results show that our algorithm reduces the network bandwidth up to 42%.

Bi-directional Bus Architecture Suitable to Multitasking in MPEG System (MPEG 시스템용 다중 작업에 적합한 양방향 버스 구조)

  • Jun Chi-hoon;Yeon Gyu-sung;Hwang Tae-jin;Wee Jae-Kyung
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.42 no.4 s.334
    • /
    • pp.9-18
    • /
    • 2005
  • This paper proposes the novel synchronous segmented bus architecture that has the pipeline bus architecture based on OCP(open core protocol) and the memory-oriented bus for MPEG system. The proposed architecture has bus architectures that support the memory interface for image data processing of MPEG system. Also it has the segmented hi-directional multiple bus architecture for multitasking processing by using multi -masters/multi - slave. In the scheme address of masters and slaves are fixed so that they are arranged for the location of IP cores according to operational characteristics of the system for efficient data processing. Also the bus architecture adopts synchronous segmented bus architecture for reuse of IP's and architecture or developed chips. This feature is suitable to the high performance and low power multimedia SoC systum by inherent characteristics of multitasking operation and segmented bus. Proposed bus architecture can have up to 3.7 times improvement in the effective bandwidth md up to 4 times reduction in the communication latency.

Twiddle Factor Index Generate Method for Memory Reduction in R2SDF FFT (R2SDF FFT의 메모리 감소를 위한 회전인자 인덱스 생성방법)

  • Yang, Seung-Won;Kim, Yong-Eun;Lee, Jong-Yeol
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.5
    • /
    • pp.32-38
    • /
    • 2009
  • FTT(Fast Fourier Transform) processor is widely used in OFDM(Orthogonal Frequency Division Multiplesing) system. Because of the increased requirement of mobility and bandwidth in the OFDM system, they need large point FTT processor. Since the size of memory which stores the twiddle factor coefficients are proportional to the N of FFT size, we propose a new method by which we can reduce the size of the coefficient memory. In the proposed method, we exploit a counter and unsigned multiplier to generate the twiddle factor indices. To verify the proposed algorithm, we design TFCGs(Twiddle Factor Coefficient Generator) for 1024pint FFTs with R2SDF(Radix-2 Single-Path Delay Feedback), $R2^3SDF,\;R2^3SDF,\;R2^4SDF$ architectures. The size of ROM is reduced to 1/8N. In the case of $R2^4SDF$ architecture, the area and the power are reduced by 57.9%, 57.5% respectively.

Fast GPU Implementation for the Solution of Tridiagonal Matrix Systems (삼중대각행렬 시스템 풀이의 빠른 GPU 구현)

  • Kim, Yong-Hee;Lee, Sung-Kee
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.32 no.11_12
    • /
    • pp.692-704
    • /
    • 2005
  • With the improvement of computer hardware, GPUs(Graphics Processor Units) have tremendous memory bandwidth and computation power. This leads GPUs to use in general purpose computation. Especially, GPU implementation of compute-intensive physics based simulations is actively studied. In the solution of differential equations which are base of physics simulations, tridiagonal matrix systems occur repeatedly by finite-difference approximation. From the point of view of physics based simulations, fast solution of tridiagonal matrix system is important research field. We propose a fast GPU implementation for the solution of tridiagonal matrix systems. In this paper, we implement the cyclic reduction(also known as odd-even reduction) algorithm which is a popular choice for vector processors. We obtained a considerable performance improvement for solving tridiagonal matrix systems over Thomas method and conjugate gradient method. Thomas method is well known as a method for solving tridiagonal matrix systems on CPU and conjugate gradient method has shown good results on GPU. We experimented our proposed method by applying it to heat conduction, advection-diffusion, and shallow water simulations. The results of these simulations have shown a remarkable performance of over 35 frame-per-second on the 1024x1024 grid.