• Title/Summary/Keyword: Register bank

Search Result 14, Processing Time 0.022 seconds

An Efficient Architecture of Inter Layer Up-sampling in Scalable Video Decoder (SVC 복호화기에서 Inter Layer 업-샘플링의 효과적인 구조)

  • Ki, Dae-Wook;Kim, Jae-Ho
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.14 no.3
    • /
    • pp.621-627
    • /
    • 2010
  • This paper proposes an efficient architecture of Inter layer up-sampling in decoder for SVC(scalable video coding). A register bank for horizontal and vertical up-sampling and interpolation units are designed, by introducing the proposed architecture, 41% memory bandwidth is reduced compared to JSVM. For real-time operation for HD 6 layer decoder having CIF, SD, HD resolution for CGS layer, the hardware is designed to operate at 127MHz. The gate count is about 3000.

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution (딥러닝 합성곱에서 데이터 재사용에 최적화된 GPGPU 설계)

  • Nam, Ki-Hun;Lee, Kwang-Yeob;Jung, Jun-Mo
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.664-671
    • /
    • 2021
  • This paper proposes a GPGPU structure that can reduce the number of operations and memory access by effectively applying a data reuse method to a convolutional neural network(CNN). Convolution is a two-dimensional operation using kernel and input data, and the operation is performed by sliding the kernel. In this case, a reuse method using an internal register is proposed instead of loading kernel from a cache memory until the convolution operation is completed. The serial operation method was applied to the convolution to increase the effect of data reuse by using the principle of GPGPU in which instructions are executed by the SIMT method. In this paper, for register-based data reuse, the kernel was fixed at 4×4 and GPGPU was designed considering the warp size and register bank to effectively support it. To verify the performance of the designed GPGPU on the CNN, we implemented it as an FPGA and then ran LeNet and measured the performance on AlexNet by comparison using TensorFlow. As a result of the measurement, 1-iteration learning speed based on AlexNet is 0.468sec and the inference speed is 0.135sec.

Analysis of Ecological Data Repository Operation Status and EcoBank Service Proposal (생태 분야 데이터 리포지터리 운영 현황 분석 및 EcoBank 서비스 제안)

  • Juseop Kim;Hyosuk Kang;Suntae Kim
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.57 no.4
    • /
    • pp.289-310
    • /
    • 2023
  • Sharing and reusing data has become essential. Data repositories are a key tool for sharing and reusing this data. The purpose of this study is to propose the service of EcoBank, which is being built and operated by the National Institute of Ecology. To achieve the research purpose, 10 out of 123 foreign data repositories in the field of ecology registered on re3data.org were selected, investigated, and analyzed. As a result of the analysis, three services were derived in common. The three services consist of first, research data policy, second, research data quality review, and research data management training and workshops. Here, in order to share EcoBank's global data, it is necessary to register with a data repository registry such as re3data.org, and it is proposed that certification be promoted to ensure the reliability and quality of the repository.

A Study on the Use of Cargo Shipping Management System for Car Ferry (카페리 대상 화물 선적 관리시스템 활용에 대한 연구)

  • Hoon Lee;Seung-Il Lee
    • Proceedings of the Korean Institute of Navigation and Port Research Conference
    • /
    • 2021.11a
    • /
    • pp.19-20
    • /
    • 2021
  • Domestic car ferries do not operate a dedicated information system for cargo loading and management, It is impossible to arrange according to the cargo type and quantity at the time of cargo reservation due to restrictions such as the need to comply with the cargo layout drawings in advance approved by the Korean Register. For this reason, this is a study to utilize similar information systems in operation in container ships and terminals for the purpose of improving cargo shipment management tasks for car ferries.

  • PDF

Dual-Port SDRAM Optimization with Semaphore Authority Management Controller

  • Kim, Jae-Hwan;Chong, Jong-Wha
    • ETRI Journal
    • /
    • v.32 no.1
    • /
    • pp.84-92
    • /
    • 2010
  • This paper proposes the semaphore authority management (SAM) controller to optimize the dual-port SDRAM (DPSDRAM) in the mobile multimedia systems. Recently, the DPSDRAM with a shared bank enabling the exchange of data between two processors at high speed has been developed for mobile multimedia systems based on dual-processors. However, the latency of DPSDRAM caused by the semaphore for preventing the access contention at the shared bank slows down the data transfer rate and reduces the memory bandwidth. The methodology of SAM increases the data transfer rate by minimizing the semaphore latency. The SAM prevents the latency of reading the semaphore register of DPSDRAM, and reduces the latency of waiting for the authority of the shared bank to be changed. It also reduces the number of authority requests and the number of times authority changes. The experimental results using a 1 Gb DPSDRAM (OneDRAM) with the SAM controllers at 66 MHz show 1.6 times improvement of the data transfer rate between two processors compared with the traditional controller. In addition, the SAM shows bandwidth enhancement of up to 38% for port A and 31% for port B compared with the traditional controller.

Design and Implementation of Optical Flow Estimator for Moving Object Detection in Advanced Driver Assistance System (첨단운전자보조시스템용 이동객체검출을 위한 광학흐름추정기의 설계 및 구현)

  • Yoon, Kyung-Han;Jung, Yong-Chul;Cho, Jae-Chan;Jung, Yunho
    • Journal of Advanced Navigation Technology
    • /
    • v.19 no.6
    • /
    • pp.544-551
    • /
    • 2015
  • In this paper, the design and implementation results of the optical flow estimator (OFE) for moving object detection (MOD) in advanced driver assistance system (ADAS). In the proposed design, Brox's algorithm with global optimization is considered, which shows the high performance in the vehicle environment. In addition, Cholesky factorization is applied to solve Euler-Lagrange equation in Brox's algorithm. Also, shift register bank is incorporated to reduce memory access rate. The proposed optical flow estimator was designed with Verilog-HDL, and FPGA board was used for the real-time verification. Implementation results show that the proposed optical flow estimator includes the logic slices of 40.4K, 155 DSP48s, and block memory of 11,290Kbits.

Register-Based Parallel Pipelined Scheme for Synchronous DRAM (동기식 기억소자를 위한 레지스터를 이용한 병렬 파이프라인 방식)

  • Song, Ho Jun
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.12
    • /
    • pp.108-114
    • /
    • 1995
  • Recently, along wtih the advance of high-performance system, synchronous DRAM's (SDRAM's) which provide consecutive data output synchronized with an external clock signal, have been reported. However, in the conventional SDRAM's which utilize a multi-stage serial pipelined scheme, the column path is divided into multi-stages depending on CAS latency N. Thus, as the operating speed and CAS latency increase, new stages must be added, thereby causing a large area penalty due to additinal latches and I/O lines. In the proposed register-based parallel pipelined scheme, (N-1) registers are located between the read data bus line pair and the data output buffer and the coming data are sequentially stored. Since the column data path is not divided and the read data is directly transmitted to the registers, the busrt read operation can easily be achieved at higher frequencies without a large area penalty and degradation of internal timing margin. Simulation results for 0.32um-Tech. 4-Bank 64M SDRAM show good operation at 200MHz and an area increment is less than 0.1% when CAS latency N is increased from 3 to 4.. This pipelined scheme is more advantageous as the operating frequency increases.

  • PDF

Rapid Data Allocation Technique for Multiple Memory Bank Architectures (다중 메모리 뱅크 구조를 위한 고속의 자료 할당 기법)

  • 조정훈;백윤홍;최준식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2003.10a
    • /
    • pp.196-198
    • /
    • 2003
  • Virtually every digital signal processors(DSPs) support on-chip multi- memory banks that allow the processor to access multiple words of data from memory in a single instruction cycle. Also, all existing fixed-point DSPs have irregular architecture of heterogeneous register which contains multiple register files that are distributed and dedicated to different sets of instructions. Although there have been several studies conducted to efficiently assign data to multi-memory banks, most of them assumed processors with relatively simple, homogeneous general-purpose resisters. Therefore, several vendor-provided compilers fer DSPs were unable to efficiently assign data to multiple data memory banks. thereby often failing to generate highly optimized code fer their machines. This paper presents an algorithm that helps the compiler to efficiently assign data to multi- memory banks. Our algorithm differs from previous work in that it assigns variables to memory banks in separate, decoupled code generation phases, instead of a single, tightly-coupled phase. The experimental results have revealed that our decoupled algorithm greatly simplifies our code generation process; thus our compiler runs extremely fast, yet generates target code that is comparable In quality to the code generated by a coupled approach

  • PDF

Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance (SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계)

  • Kwak, Jae Chang
    • Journal of IKEEE
    • /
    • v.19 no.3
    • /
    • pp.455-459
    • /
    • 2015
  • This paper proposes a dispatch unit of GP-GPU with SIMT architecture to support the acceleration of general-purpose operation as well as graphics processing. If all the information of an operand used instructions issued from the warp scheduler is decoded, an unnecessary operand load occurs, resulting in register loads. To resolve this problem, this paper proposes a method that can reduce the operand load and the load on the resister by decoding only the information of the operand using a pre-decoding method. The operand information from the dispatch unit is passed to the operand selection unit with preventing register bank collisions. Thus the overall performance are improved. In the simulation test, the total clock cycles required by processing 10,000 arbitrary instructions issued from the wrap scheduler using ModelSim SE 10.0b are measured. It shows that the application of the dispatch unit equipped with the pre-decoding function proposed in this paper can make an improvement of about 12% in processing performance compared to the conventional method.

Banked Register File for ARM Thumb to Secure More Registers (다수의 레지스터를 확보하기 위한 ARM Thumb 레지스터 뱅크의 제안)

  • Lee Je-Hyung;Park Jinpyo;Moon Soo-Mook
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2005.07a
    • /
    • pp.781-783
    • /
    • 2005
  • ARM 프로세서는 내장형 시스템에서 가장 널리 사용되는 32비트 마이크로 프로세서 중 하나이며, Thumb 명령어 세트는 보다 작은 코드 크기를 위해 제공하는 16비트 확장 명령어 세트이다. Thumb의 약점중의 하나는 줄어든 명령어 길이 때문에 이용할 수 있는 레지스터의 개수가 반으로 줄어든다는 것인데 결과적으로 가용 레지스터의 부족으로 인해 spill 코드가 빈번하게 발생할 수 있다. 우리는 약간의 하드웨어 및 명령어 수정을 통해 뱅크(bank)로 이루어진 레지스터 파일을 제공하고자 한다. 이로 인해 컴파일러는 보다 여유 있는 레지스터를 확보하게 되어 spill 코드가 줄어들게 되므로 보다 작은 크기의 코드를 얻어낼 수 있다. 이 변화된 형태의 레지스터 파일을 운용하기 위한 효율적인 레지스터 할당기법이 요구되며, 제안하는 영역기반 레지스터 할당기법을 통해 이이 최적화된 Thumb 코드 대비 약 $5.1\%$의 코드 크기 감소효과를 볼 수 있었다.

  • PDF