• Title/Summary/Keyword: Pipelined

Search Result 377, Processing Time 0.019 seconds

Consolidation of Subtasks for Target Task in Pipelined NLP Model

  • Son, Jeong-Woo;Yoon, Heegeun;Park, Seong-Bae;Cho, Keeseong;Ryu, Won
    • ETRI Journal
    • /
    • v.36 no.5
    • /
    • pp.704-713
    • /
    • 2014
  • Most natural language processing tasks depend on the outputs of some other tasks. Thus, they involve other tasks as subtasks. The main problem of this type of pipelined model is that the optimality of the subtasks that are trained with their own data is not guaranteed in the final target task, since the subtasks are not optimized with respect to the target task. As a solution to this problem, this paper proposes a consolidation of subtasks for a target task ($CST^2$). In $CST^2$, all parameters of a target task and its subtasks are optimized to fulfill the objective of the target task. $CST^2$ finds such optimized parameters through a backpropagation algorithm. In experiments in which text chunking is a target task and part-of-speech tagging is its subtask, $CST^2$ outperforms a traditional pipelined text chunker. The experimental results prove the effectiveness of optimizing subtasks with respect to the target task.

Register-Based Parallel Pipelined Scheme for Synchronous DRAM (동기식 기억소자를 위한 레지스터를 이용한 병렬 파이프라인 방식)

  • Song, Ho Jun
    • Journal of the Korean Institute of Telematics and Electronics A
    • /
    • v.32A no.12
    • /
    • pp.108-114
    • /
    • 1995
  • Recently, along wtih the advance of high-performance system, synchronous DRAM's (SDRAM's) which provide consecutive data output synchronized with an external clock signal, have been reported. However, in the conventional SDRAM's which utilize a multi-stage serial pipelined scheme, the column path is divided into multi-stages depending on CAS latency N. Thus, as the operating speed and CAS latency increase, new stages must be added, thereby causing a large area penalty due to additinal latches and I/O lines. In the proposed register-based parallel pipelined scheme, (N-1) registers are located between the read data bus line pair and the data output buffer and the coming data are sequentially stored. Since the column data path is not divided and the read data is directly transmitted to the registers, the busrt read operation can easily be achieved at higher frequencies without a large area penalty and degradation of internal timing margin. Simulation results for 0.32um-Tech. 4-Bank 64M SDRAM show good operation at 200MHz and an area increment is less than 0.1% when CAS latency N is increased from 3 to 4.. This pipelined scheme is more advantageous as the operating frequency increases.

  • PDF

Pipelined Parallel Processing System for Image Processing (영상처리를 위한 Pipelined 병렬처리 시스템)

  • Lee, Hyung;Kim, Jong-Bae;Choi, Sung-Hyk;Park, Jong-Won
    • Journal of IKEEE
    • /
    • v.4 no.2 s.7
    • /
    • pp.212-224
    • /
    • 2000
  • In this paper, a parallel processing system is proposed for improving the processing speed of image related applications. The proposed parallel processing system is fully synchronous SIMD computer with pipelined architecture and consists of processing elements and a multi-access memory system. The multi-access memory system is made up of memory modules and a memory controller, which consists of memory module selection module, data routing module, and address calculating and routing module, to perform parallel memory accesses with the variety of types: block, horizontal, and vertical access way. Morphological filter had been applied to verify the parallel processing system and resulted in faithful processing speed.

  • PDF

A 32-bit Pipelined Carry-select Adder Using the Complementary Scheme (보수 이론을 이용한 32비트 파이프라인 캐리 선택 가산기)

  • Kim, Young-Joon;Kim, Lee-Sup
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.9
    • /
    • pp.55-61
    • /
    • 2002
  • Using the carry-select adder scheme, an adder with small number of stages can be operated as fast as an adder with large number of stages. In this paper, a 4-block 5-stage 32-bit pipelined carry-select adder is designed and implemented. The proposed adder operates as fast as a conventional 16-stage 32-bit pipelined adder while the number of registers required is nearly same as a conventional 4-stage pipelined adder. This adder is operated at 1.67GHz clock frequency in a standard 0.25um CMOS technology with 2.5 V supply voltage.

A Study of an 8-b${\times}$8-b Adiabatic Pipelined Multiplier with Simplified Supply Clock Generator (단열회로를 이용한 8-b${\times}$8-b 파이프라인 승산기와 개선된 전원클럭 발생기의 연구)

  • Moon, Yong
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.38 no.4
    • /
    • pp.285-291
    • /
    • 2001
  • An 8-b$\times$8-b adiabatic pipelined multiplier is designed. Simplified four phase clock generator is also designed to provide supply clocks for adiabatic circuits. All the clock line charge on the capacitive interconnections is recovered to save energy. Adiabatic circuits are designed based on ECRL(efficient charge recovery logic) and are integrated using 0.6${\mu}{\textrm}{m}$ CMOS technology. The efficiency of proposed supply clock generator is better than the previous one by 4~11%. Simulation results show that the power consumption of adiabatic pipelined multiplier is reduced by a factor of 2.6~3.5 compared to a conventional pipelined CMOS multiplier.

  • PDF

Optimization of Pipelined Discrete Wavelet Packet Transform Based on an Efficient Transpose Form and an Advanced Functional Sharing Technique

  • Nguyen, Hung-Ngoc;Kim, Cheol-Hong;Kim, Jong-Myon
    • Journal of Information Processing Systems
    • /
    • v.15 no.2
    • /
    • pp.374-385
    • /
    • 2019
  • This paper presents an optimal implementation of a Daubechies-based pipelined discrete wavelet packet transform (DWPT) processor using finite impulse response (FIR) filter banks. The feed-forward pipelined (FFP) architecture is exploited for implementation of the DWPT on the field-programmable gate array (FPGA). The proposed DWPT is based on an efficient transpose form structure, thereby reducing its computational complexity by half of the system. Moreover, the efficiency of the design is further improved by using a canonical-signed digit-based binary expression (CSDBE) and advanced functional sharing (AFS) methods. In this work, the AFS technique is proposed to optimize the convolution of FIR filter banks for DWPT decomposition, which reduces the hardware resource utilization by not requiring any embedded digital signal processing (DSP) blocks. The proposed AFS and CSDBE-based DWPT system is embedded on the Virtex-7 FPGA board for testing. The proposed design is implemented as an intellectual property (IP) logic core that can easily be integrated into DSP systems for sub-band analysis. The achieved results conclude that the proposed method is very efficient in improving hardware resource utilization while maintaining accuracy of the result of DWPT.

VHDL Design for Out-of-Order Superscalar Processor of A Fully Pipelined Scheme (완전한 파이프라인 방식의 비순차실행 수퍼스칼라 프로세서의 VHDL 설계)

  • Lee, Jongbok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.1
    • /
    • pp.99-105
    • /
    • 2021
  • Today, a superscalar processor is the basic unit or an essential component of a multi-core processor, SoCs, and GPUs. Hence, a high-performance out-of-order superscalar processor must be adopted for these systems to maximize its performance. The superscalar processor fetches, issues, executes, and writes back multiple instructions per cycle by utilizing reorder buffers and reservation stations to dynamically schedule instructions in a pipelined scheme. In this paper, a fully pipelined out-of-order superscalar processor with speculative execution is designed with VHDL and verified with GHDL. As a result of the simulation, the program composed of ARM instructions is successfully performed.

A New Pipelined Binary Search Architecture for IP Address Lookup (IP 어드레스 검색을 위한 새로운 pipelined binary 검색 구조)

  • Lim Hye-Sook;Lee Bo-Mi;Jung Yeo-Jin
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.29 no.1B
    • /
    • pp.18-28
    • /
    • 2004
  • Efficient hardware implementation of address lookup is one of the most important design issues of internet routers. Address lookup significantly impacts router performance since routers need to process tens-to-hundred millions of packets per second in real time. In this paper, we propose a practical IP address lookup structure based on the binary tree of prefixes of different lengths. The proposed structure produces multiple balanced trees, and hence it solve the issues due to the unbalanced binary prefix tree of the existing scheme. The proposed structure is implemented using pipelined binary search combined with a small size TCAM. Performance evaluation results show that the proposed architecture requires a 2000-entry TCAM and total 245 kbyte SRAMs to store about 30,000 prefix samples from MAE-WEST router, and an address lookup is achieved by a single memory access. The proposed scheme scales very well with both of large databases and longer addresses as in IPv6.

A 1.8 V 40-MS/sec 10-bit 0.18-㎛ CMOS Pipelined ADC using a Bootstrapped Switch with Constant Resistance

  • Eo, Ji-Hun;Kim, Sang-Hun;Kim, Mun-Gyu;Jang, Young-Chan
    • Journal of information and communication convergence engineering
    • /
    • v.10 no.1
    • /
    • pp.85-90
    • /
    • 2012
  • A 40-MS/sec 10-bit pipelined analog to digital converter (ADC) with a 1.2 Vpp differential input signal is proposed. The implemented pipelined ADC consists of eight stages of 1.5 bit/stage, one stage of 2 bit/stage, a digital error correction block, band-gap reference circuit & reference driver, and clock generator. The 1.5 bit/stage consists of a sub-ADC, digital to analog (DAC), and gain stage, and the 2.0 bit/stage consists of only a 2-bit sub-ADC. A bootstrapped switch with a constant resistance is proposed to improve the linearity of the input switch. It reduces the maximum VGS variation of the conventional bootstrapped switch by 67%. The proposed bootstrapped switch is used in the first 1.5 bit/stage instead of a sample-hold amplifier (SHA). This results in the reduction of the hardware and power consumption. It also increases the input bandwidth and dynamic performance. A reference voltage for the ADC is driven by using an on-chip reference driver without an external reference. A digital error correction with a redundancy is also used to compensate for analog noise such as an input offset voltage of a comparator and a gain error of a gain stage. The proposed pipelined ADC is implemented by using a 0.18-${\mu}m$ 1- poly 5-metal CMOS process with a 1.8 V supply. The total area including a power decoupling capacitor and the power consumption are 0.95 $mm^2$ and 51.5 mW, respectively. The signal-to-noise and distortion ratio (SNDR) is 56.15 dB at the Nyquist frequency, resulting in an effective number of bits (ENOB) of 9.03 bits.

8.3 Gbps pipelined LEA Crypto-Processor Supporting ECB/CTR Modes of operation (ECB/CTR 운영모드를 지원하는 8.3 Gbps 파이프라인 LEA 암호/복호 프로세서)

  • Sung, Mi-Ji;Shin, Kyung-Wook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.12
    • /
    • pp.2333-2340
    • /
    • 2016
  • A LEA (Lightweight Encryption Algorithm) crypto-processor was designed, which supports three master key lengths of 128/ 192/256-bit, ECB and CTR modes of operation. To achieve high throughput rate, the round transformation block was designed with 128 bits datapath and a pipelined structure of 16 stages. Encryption/decryption is carried out through 12/14/16 pipelined stages according to the master key length, and each pipelined stage performs round transformation twice. The key scheduler block was optimized to share hardware resources that are required for encryption, decryption, and three master key lengths. The round keys generated by key scheduler are stored in 32 round key registers, and are repeatedly used in round transformation until master key is updated. The pipelined LEA processor was verified by FPGA implementation, and the estimated performance is about 8.3 Gbps at the maximum clock frequency of 130 MHz.