• Title/Summary/Keyword: 사이클 테이블

Search Result 11, Processing Time 0.026 seconds

A New Pipelined Divider with a Small Lookup Table (작은 룩업테이블을 가지는 새로운 파이프라인 나눗셈기)

  • Jeong, Woong;Park, Woo-Chan;Kwak, Sung-Ho;Yang, Hoon-Mo;Jeong, Cheol-Ho;Han, Tack-Don;Lee, Moon-Key
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.40 no.9
    • /
    • pp.724-733
    • /
    • 2003
  • Generally, dividers have been designed to use iteration, but recently the research on the pipelined divider is underway. It is a difficult point in the known pipelined division unit that a large lookup table is required. In this paper, the cost-effective pipelined divider is proposed, that needs a lookup table smaller than that of the other pipelined divider. The latency of the proposed divider is 3 cycles. We obtain a 30% reduced area than that of P. Hung.

A Prefetch Architecture with Efficient Branch Prediction for a 64-bit 4-way Superscalar Microprocessor (64비트 4-way 수퍼스칼라 마이크로프로세서의 효율적인 분기 예측을 수행하는 프리페치 구조)

  • 문상국;문병인;이용환;이용석
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.25 no.11B
    • /
    • pp.1939-1947
    • /
    • 2000
  • 본 논문에서는 명령어의 효율적인 페치를 위해 분기 타겟 주소 전체를 사용하지 않고 캐쉬 메모리(cache memory) 내의 적은 비트 수로 인덱싱 하여 한 클럭 사이클 안에 최대 4개의 명령어를 다음 파이프라인으로 보내줄 수 있는 방법을 제시한다. 본 프리페치 유닛은 크게 나누어 3개의 영역으로 나눌 수 있는데, 분기에 관련하여 미리 부분적으로 명령어를 디코드 하는 프리디코드(predecode) 블록, 타겟 주소(NTA : Next Target Address) 테이블 영역을 추가시킨 명령어 캐쉬(instruction cache) 블록, 전체 유닛을 제어하고 가상 주소를 관리하는 프리페치(prefetch) 블록으로 나누어진다. 사용된 명령어들은 SPARC(Scalable Processor ARChitecture) V9에 기준 하였고 구현은 Verilog-HDL(Hardwave Description Language)을 사용하여 기능 수준으로 기술되고 검증되었다. 구현된 프리페치 유닛은 명령어 흐름에 분기가 존재하더라도 단일 사이클 안에 4개까지의 명령어들을 정확한 예측 하에 다음 파이프라인으로 보내줄 수 있다. 또한 NTA를 사용한 방법은 같은 수의 레지스터 비트를 사용하였을 때 BTB(Branch Target Buffer)를 사용하는 방법과 비교하여 2배정도 많은 개수의 분기 명령 주소를 저장할 수 있는 장점이 있다.

  • PDF

An Analysis of Power Dissipation of Value Prediction in Superscalar Processors (슈퍼스칼라 프로세서에서의 값 예측의 전력 소모 측정 및 분석)

  • 이명근;이상정
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2002.10c
    • /
    • pp.688-690
    • /
    • 2002
  • 고성능 슈퍼스칼라 프로세서에서는 명령어 수준 병렬성(Instruction Level Parallelism, ILP)의 장애인 명령어간의 종속 관계 중 데이터 종속관계를 극복하기 위해 값 예측기를 이용하여 모험적으로 명령어들을 실행한다. 값 예측 시에 필요한 테이블 참조와 값 예측 실패 시 실행되는 잘못된 명령어의 실행은 프로세서의 부가적인 전력 소모를 요구한다. 본 논문에서는 값 예측기와 Cai-Lim의 전력모델을 슈퍼스칼라 프로세서 사이클 수준 시뮬레이터인 SimpleScalar 3.0 툴셋에 삽입하여 전력 소모량을 측정하고 분석한다.

  • PDF

A Study on the Design of Index Table Drive of Rotary Transfer Machines to Reduce Cycle Time (사이클 타임 단축을 위한 로터리 트랜스퍼 머신의 인덱스 테이블 구동부 설계에 관한 연구)

  • Huh, Ki-Seok;Park, Yong-Woo;Kim, Dong-Seon;Lyu, Sung-Ki
    • Journal of the Korean Society of Manufacturing Process Engineers
    • /
    • v.21 no.8
    • /
    • pp.60-65
    • /
    • 2022
  • This study focuses on the driving control design of an index, which is a key component of a rotary transfer machine that is effective in improving productivity and reducing manufacturing costs by shortening cycle time. Although various index studies have been conducted on the rotation of workpieces such as general-purpose machine tools and tilting indices, the development of an index for rotary transfer machines for transfer is insufficient. The index consists of a body, table, hydraulic cylinder, motor, reducer, and curved coupling. The torque of the table for driving was selected, and the angular velocity and torque pattern were simulated using the motor manufacturer's program. The specifications of the drive motor were determined based on the selected torque.

Parallel Architecture Design of H.264/AVC CAVLC for UD Video Realtime Processing (UD(Ultra Definition) 동영상 실시간 처리를 위한 H.264/AVC CAVLC 병렬 아키텍처 설계)

  • Ko, Byung Soo;Kong, Jin-Hyeung
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.50 no.5
    • /
    • pp.112-120
    • /
    • 2013
  • In this paper, we propose high-performance H.264/AVC CAVLC encoder for UD video real time processing. Statistical values are obtained in one cycle through the parallel arithmetic and logical operations, using non-zero bit stream which represents zero coefficient or non-zero coefficient. To encode codeword per one cycle, we remove recursive operation in level encoding through parallel comparison for coefficient and escape value. In oder to implement high-speed circuit, proposed CAVLC encoder is designed in two-stage {statical scan, codeword encoding} pipeline. Reducing the encoding table, the arithmetic unit is used to encode non-coefficient and to calculate the codeword. The proposed architecture was simulated in 0.13um standard cell library. The gate count is 33.4Kgates. The architecture can support Ultra Definition Video ($3840{\times}2160$) at 100 frames per second by running at 100MHz.

Design of Hardware Accelerator for Portable Real-time MP3 Audio Encoder (휴대용 실시간 MP 오디오 부호화기를 위한 하드웨어 가속기 설계)

  • 여창훈;방경호;이근섭;박영철;윤대희
    • Proceedings of the IEEK Conference
    • /
    • 2003.07e
    • /
    • pp.2132-2135
    • /
    • 2003
  • 본 논문에서는 고정소수점 DSP로 구현한 실시간 MP3 오디오 부호화기에 사용되는 초월함수용 하드웨어 가속기 구조를 제안한다. 구현된 하드웨어 가속기는 MP3 부호화 성능을 저하시키는 초월함수 연산오차에 강인하도록 설계되었다. 제안된 가속기의 연산오차는 Q1.23 고정소수점 출력에서 2비트, 즉 2/sup -21/ 까지의 연산오차를 가진다. LAME 부호화기[5]심리음향 모델의 SMR 오차는 테이블 보간법[4]을 사용할 경우에 비해 4dB이상 향상되었으며, 연산량은 총 4 MIPS 감소하였다. 제안한 하드웨어 가속기는 Verilog HDL로 기술되었으며, SYNOPSYS에서 0.18㎛ CMOS 표준 셀 라이브러리 공정으로 합성되었다. 합성 면적은 7514 게이트이며 초월함수 연산에 대한 동작속도는 3 사이클이다.

  • PDF

Design of Square Root and Inverse Square Root Arithmetic Units for Mobile 3D Graphic Processing (모바일 3차원 그래픽 연산을 위한 제곱근 및 역제곱근 연산기 구조 및 설계)

  • Lee, Chan-Ho
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.46 no.3
    • /
    • pp.20-25
    • /
    • 2009
  • We propose hardware architecture of floating-point square root and inverse square root arithmetic units using lookup tables. They are used for lighting engines and shader processor for 3D graphic processing. The architecture is based on Taylor series expansion and consists of lookup tables and correction units so that the size of look-up tables are reduced. It can be applied to 32 bit floating point formats of IEEE-754 and reduced 24 bit floating point formats. The square root and inverse square root arithmetic units for 32 bit and 24 bit floating format number are designed as the proposed architecture. They can operation in a single cycle, and satisfy the precision of $10^{-5}$ required by OpenGL 1.x ES. They are designed using Verilog-HDL and the RTL codes are verified using an FPGA.

Measurement and Analysis of Power Dissipation of Value Speculation in Superscalar Processors (슈퍼스칼라 프로세서에서 값 예측을 이용한 모험적 실행의 전력소모 측정 및 분석)

  • 이상정;이명근;신화정
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.30 no.12
    • /
    • pp.724-735
    • /
    • 2003
  • In recent high-performance superscalar processors, the result value of an instruction is predicted to improve instruction-level parallelism by breaking data dependencies. Using those predicted values, instructions are speculatively executed and substantial performance can be gained. It, however, requires additional power consumption due to the frequent access and update of the value prediction table. In this paper, first, the trade-off between the performance improvement and the increased power consumption for value prediction is measured and analyzed. And, in order to reduce additional power consumption without performance loss, the technique of controlling speculative execution with confidence counter and predicting useful instructions is developed. Also, in order to prove the validity, a tool is developed that can simulate processor behavior at cycle-level and measure total energy consumption and power consumption per cycle.

Design of Efficient Gradient Orientation Bin and Weight Calculation Circuit for HOG Feature Calculation (HOG 특징 연산에 적용하기 위한 효율적인 기울기 방향 bin 및 가중치 연산 회로 설계)

  • Kim, Soojin;Cho, Kyeongsoon
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.11
    • /
    • pp.66-72
    • /
    • 2014
  • Histogram of oriented gradient (HOG) feature is widely used in vision-based pedestrian detection. The interpolation is the most important technique in HOG feature calculation to provide high detection rate. In interpolation technique of HOG feature calculation, two nearest orientation bins to gradient orientation for each pixel and the corresponding weights are required. In this paper, therefore, an efficient gradient orientation bin and weight calculation circuit for HOG feature is proposed. In the proposed circuit, pre-calculated values are defined in tables to avoid the operations of tangent function and division, and the size of tables is minimized by utilizing the characteristics of tangent function and weights for each gradient orientation. Pipeline architecture is adopted to the proposed circuit to accelerate the processing speed, and orientation bins and the corresponding weights for each pixel are calculated in two clock cycles by applying efficient coarse and fine search schemes. Since the proposed circuit calculates gradient orientation for each pixel with the interval of $1^{\circ}$ and determines both orientation bins and weights required in interpolation technique, it can be utilized in HOG feature calculation to support interpolation technique to provide high detection rate.

Measuring Method of Worst-case Execution Time by Analyzing Relation between Source Code and Executable Code (소스코드와 실행코드의 상관관계 분석을 통한 최악실행시간 측정 방법)

  • Seo, Yongjin;Kim, Hyeon Soo
    • Journal of Internet Computing and Services
    • /
    • v.17 no.4
    • /
    • pp.51-60
    • /
    • 2016
  • Embedded software has requirements such as real-time and environment independency. The real-time requirement is affected from worst-case execution time of loaded tasks. Therefore, to guarantee real-time requirement, we need to determine a program's worst-case execution time using static analysis approach. However, the existing methods for worst-case execution time analysis do not consider the environment independency. Thus, in this paper, in order to provide environment independency, we propose a method for measuring task's execution time from the source codes. The proposed method measures the execution time through the control flow graph created from the source codes instead of the executable codes. However, the control flow graph created from the source code does not have information about execution time. Therefore, in order to provide this information, the proposed method identifies the relationships between statements in the source code and instructions in the executable code. By parameterizing those parts that are dependent on processors based on the relationships, it is possible to enhance the flexibility of the tool that measures the worst-case execution time.