• Title/Summary/Keyword: 레지스터

Search Result 505, Processing Time 0.028 seconds

An Efficient Algorithm for Sparse Code Motion (희소코드모션을 위한 효율적인 알고리즘)

  • Shin Hyun-Deok;Yu Heui-Jong;Ahn Heui-Hak
    • The KIPS Transactions:PartA
    • /
    • v.12A no.1 s.91
    • /
    • pp.79-86
    • /
    • 2005
  • This paper suggests that sparse code motion algorithm should be used to make the code optimal in the respect of computation and lifetime. This algorithm Is SpCM algorithm, which expand BCM and LCM algorithm. BCM algorithm carries out the optimal code motion computationally and LCM algorithm reduces the register pressure in SpCM algorithm. Generally, code motion algorithm accomplishes the run-time optimal connected with the optimum of computation and the register pressure. Computational cost and consideration of the code size in the register pressure are also added in the paper. The optimum of code motion could be obtained through SpCM algorithm, which considers the code size, in audition to computational optimal and lifetime optimal. The algorithm presented in this paper is the most optimal algorithm in the respect of computation and lifetime, as all the unnecessary code motions are restrained.

Multi-Port Register File Design and Implementation for the SIMD Programmable Shader (SIMD 프로그래머블 셰이더를 위한 멀티포트 레지스터 파일 설계 및 구현)

  • Yoon, Wan-Oh;Kim, Kyeong-Seob;Cheong, Jin-Ha;Choi, Sang-Bang
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.9
    • /
    • pp.85-95
    • /
    • 2008
  • Characteristically, 3D graphic algorithms have to perform complex calculations on massive amount of stream data. The vertex and pixel shaders have enabled efficient execution of graphic algorithms by hardware, and these graphic processors may seem to have achieved the aim of "hardwarization of software shaders." However, the hardware shaders have hitherto been evolving within the limits of Z-buffer based algorithms. We predict that the ultimate model for future graphic processors will be an algorithm-independent integrated shader which combines the functions of both vertex and pixel shaders. We design the register file model that supports 3-dimensional computer graphic on the programmable unified shader processor. we have verified the accurate calculated value using FPGA Virtex-4(xcvlx200) made by Xilinx for operating binary files made by the implementation progress based on synthesis results.

A Partial Access Mechanism on a Register for Low-cost Embedded Multimedia ASIP (저비용 내장형 멀티미디어 프로세서를 위한 분할 레지스터 접근 구조)

  • Joe, Min-Young;Jeong, Ha-Young;Lee, Yong-Surk
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.45 no.9
    • /
    • pp.50-56
    • /
    • 2008
  • In this paper, we propose a partial access mechanism for low cost multimedia processors. Due to the cost increase of adding the SIMD register files and the execution blocks, we experience difficulties applying the SIMD instructions to low cost multimedia embedded processors. The proposed mechanism has the advantages of decreasing the cost burden of the additional hardware and enhancing total performance of the SIMD operation. We designed the ASIP in which the mechanism is applied and compared the latency of the SIMD operation regarding the use of instruction sets in the DSP benchmark. Then, we analyzed the total performance enhancement and the reduction in area burden by synthesizing the ASIP using 0.25um TSMC CMOS technology. As a result, there are approximately a 38% of performance increase and a 13.4% of area increase according to the proposed mechanism simulation.

Hardware Design of the Synchronizer and the Demodulator of a 18000-3 PJM Mode Tag (18000-3 PJM 모드 태그의 동기부 및 복조부 하드웨어 설계)

  • Jeon, Don-Guk;Yang, Hoon-Gee
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.2
    • /
    • pp.77-83
    • /
    • 2011
  • In this paper, we present the design procedure of the synchronizer and the demodulator of a 13.56MHz RFID PJM tag, which was standardized in ISO 18000-3 mode 3. We optimize the algorithms in order to minimize the number of registers and implement them based on international standard. The designed module is simulated by Modelsim and FPGA. The synchronizer is composed of 3 correlators that is implemented by 1,024(16bit ${\times}$ 64cycle) registers. The demodulator is composed of 2 correlators that is implemented by 128(2bit ${\times}$ 64cycle) registers. The simulation performed with the demodulator integrated with the synchronizer shows that it works at about 87% success rate with the test data of SNR -2dB and 100% with those of SNR 4dB.

Design of a Dispatch Unit & Operand Selection Unit for Improving the SIMT Based GP-GPU Instruction Performance (SIMT구조 GP-GPU의 명령어 처리 성능 향상을 위한 Dispatch Unit과 Operand Selection Unit설계)

  • Kwak, Jae Chang
    • Journal of IKEEE
    • /
    • v.19 no.3
    • /
    • pp.455-459
    • /
    • 2015
  • This paper proposes a dispatch unit of GP-GPU with SIMT architecture to support the acceleration of general-purpose operation as well as graphics processing. If all the information of an operand used instructions issued from the warp scheduler is decoded, an unnecessary operand load occurs, resulting in register loads. To resolve this problem, this paper proposes a method that can reduce the operand load and the load on the resister by decoding only the information of the operand using a pre-decoding method. The operand information from the dispatch unit is passed to the operand selection unit with preventing register bank collisions. Thus the overall performance are improved. In the simulation test, the total clock cycles required by processing 10,000 arbitrary instructions issued from the wrap scheduler using ModelSim SE 10.0b are measured. It shows that the application of the dispatch unit equipped with the pre-decoding function proposed in this paper can make an improvement of about 12% in processing performance compared to the conventional method.

A Design of a Register Insertion Backbone Ring Network (레이스터 인서션 Backbone 링 네트워크에 관한 연구)

  • 강철신
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.17 no.8
    • /
    • pp.796-804
    • /
    • 1992
  • This paper presents a design of a backbone network which uses a rigister-Insertion ring structure, The introduction of a high speed register in sertton backbone ring enables high performance inter-network 4ommunicatlons In a simple and modular structure at low cost and Its concurrent communications.. Two or more bridge nodes can be used to construct a register Insertion backbone ring network. The high bandwidth of the backbone ring sup ports heavy traffic for Inter-segment Eornrnunicatlons. The bridge node does both local address filtering to block data entering the ring and remote address filtering to block data entering the local LAN segment . Title local address greatly reduces the rate on the backbone ring and the remote address filterlng greatly reduces the traffic rate on each LAN segment. An feature makes the network the network reconflguratlon simpler and transparent to users. A throughput analysis Is used to deterrune the bandwidth of the backbone rlr)g transmission medium.

  • PDF

An Efficient Code Expansion from EM to SPARC Code (EM에서 SPARC 코드로 효율적인 코드 확장)

  • Oh, Se-Man;Yun, Young-Shick
    • The Transactions of the Korea Information Processing Society
    • /
    • v.4 no.10
    • /
    • pp.2596-2604
    • /
    • 1997
  • There are two kinds of backends in ACK:code generator(full-fledged backend) and code expander(fast backend). Code generators generate target code using string pattern matching and code expanders generate target code using macro expansion. ACK translates EM to SPARC code using code expander. The corresponding SPARC code sequences for a EM code are generated and then push-pop optimization is performed. But, there is the problem of maintaining hybrid stack. And code expander is not considered to passes parameters of a procedure call through register windows. The purpose of this paper is to improve SPARC code quality. We suggest a method of SPARC cod generation using EM tree. Our method is divided into two phases:EM tree building phase and code expansion phase. The EM tree building phase creates the EM tree and code expansion phase translates it into SPARC code. EM tree is designed to pass parameters of a procedure call through register windows. To remove hybrid stack, we extract an additional information from EM code. We improved many disadvantages that arise from code expander in ACK.

  • PDF

Design of an Optimized GPGPU for Data Reuse in DeepLearning Convolution (딥러닝 합성곱에서 데이터 재사용에 최적화된 GPGPU 설계)

  • Nam, Ki-Hun;Lee, Kwang-Yeob;Jung, Jun-Mo
    • Journal of IKEEE
    • /
    • v.25 no.4
    • /
    • pp.664-671
    • /
    • 2021
  • This paper proposes a GPGPU structure that can reduce the number of operations and memory access by effectively applying a data reuse method to a convolutional neural network(CNN). Convolution is a two-dimensional operation using kernel and input data, and the operation is performed by sliding the kernel. In this case, a reuse method using an internal register is proposed instead of loading kernel from a cache memory until the convolution operation is completed. The serial operation method was applied to the convolution to increase the effect of data reuse by using the principle of GPGPU in which instructions are executed by the SIMT method. In this paper, for register-based data reuse, the kernel was fixed at 4×4 and GPGPU was designed considering the warp size and register bank to effectively support it. To verify the performance of the designed GPGPU on the CNN, we implemented it as an FPGA and then ran LeNet and measured the performance on AlexNet by comparison using TensorFlow. As a result of the measurement, 1-iteration learning speed based on AlexNet is 0.468sec and the inference speed is 0.135sec.

Understanding of the Linguistic Features of Earth Science Treatises: Register Analysis Approach (지구과학 논문의 언어 특성 이해: 레지스터 분석)

  • Maeng, Seung-Ho;Shin, Myung-Hwan;Cha, Hyun-Jung;Ham, Seok-Jin;Shin, Hyeon-Jeong;Kim, Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.31 no.7
    • /
    • pp.785-797
    • /
    • 2010
  • This study identified the linguistic features of Earth science treatises through the analysis of the register. Data included three Korean treatises that were in geology, atmospheric science, and oceanography. The register of Earth science treatise was as follows: First, there were semantic, referential connections between Themes and Rhemes, that the messages and main points of the texts were expressed coherently and cohesively. Second, some predicates were used which were related to deductive inference, abductive inferences, or causal relation according to the genre elements of each text. The logical relations were not represented by the conjunctions but by the types of predicates. Third, most texts in the treatises showed interpersonally weak relationship using mental predicates related to possibilities, which meant scientists expressed indirectly their interpretation, explanation, or arguments. From these results, we argued that some activities of unpacking the language of science be included in science curriculum in order to improve students' literacy of science texts and understanding scientists' knowledge construction.

VLSI Design of DWT-based Image Processor for Real-Time Image Compression and Reconstruction System (실시간 영상압축과 복원시스템을 위한 DWT기반의 영상처리 프로세서의 VLSI 설계)

  • Seo, Young-Ho;Kim, Dong-Wook
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1C
    • /
    • pp.102-110
    • /
    • 2004
  • In this paper, we propose a VLSI structure of real-time image compression and reconstruction processor using 2-D discrete wavelet transform and implement into a hardware which use minimal hardware resource using ASIC library. In the implemented hardware, Data path part consists of the DWT kernel for the wavelet transform and inverse transform, quantizer/dequantizer, the huffman encoder/huffman decoder, the adder/buffer for the inverse wavelet transform, and the interface modules for input/output. Control part consists of the programming register, the controller which decodes the instructions and generates the control signals, and the status register for indicating the internal state into the external of circuit. According to the programming condition, the designed circuit has the various selective output formats which are wavelet coefficient, quantization coefficient or index, and Huffman code in image compression mode, and Huffman decoding result, reconstructed quantization coefficient, and reconstructed wavelet coefficient in image reconstructed mode. The programming register has 16 stages and one instruction can be used for a horizontal(or vertical) filtering in a level. Since each register automatically operated in the right order, 4-level discrete wavelet transform can be executed by a programming. We synthesized the designed circuit with synthesis library of Hynix 0.35um CMOS fabrication using the synthesis tool, Synopsys and extracted the gate-level netlist. From the netlist, timing information was extracted using Vela tool. We executed the timing simulation with the extracted netlist and timing information using NC-Verilog tool. Also PNR and layout process was executed using Apollo tool. The Implemented hardware has about 50,000 gate sizes and stably operates in 80MHz clock frequency.