# **Energy Efficient Processing Engine in LDPC Application with High-Speed Charge Recovery Logic** Yimeng Zhang, Mengshu Huang, Nan Wang, Satoshi Goto, and Tsutomu Yoshihara Abstract—This paper presents a Processing Engine (PE) which is used in Low Density Parity Codec (LDPC) application with a novel charge-recovery logic called pseudo-NMOS boost logic (pNBL), to achieve high-speed and low power dissipation. pNBL is a high-overdriven and low area consuming charge recovery logic, which belongs to boost logic family. Proposed Processing Engine is used in LDPC circuit to reduce operating power dissipation and increase speed. processing To demonstrate performance of proposed PE, a test chip is designed and fabricated with 0.18 µm CMOS technology. Simulation results indicate that proposed PE with pNBL dissipates only 1 pJ/cycle when working at the frequency of 403 MHz, which is only 36% of PE with conventional static CMOS gates. measurement results show that the test chip can work as high as 609 MHz with the energy dissipation of 2.1 pJ/cycle. *Index Terms*—Adiabatic logic, boost logic, low energy dissipation, processing engine ### I. Introduction In the recent research, charge recovery logic is restudied because of its low power dissipation. In numerous presented literatures [1-7], charge recovery logics have demonstrated the advantage on low power. Charge recovery logic can achieve good performance over low power because circuit energy is conserved rather than dissipated as heat. A number of factors such as transition rate and threshold voltage can affect the energy efficiency of charge recovery logic, but charge recovery logic can work with lower energy dissipation than $CV^2/2$ which is the typical energy dissipation of static CMOS logic [8]. As the operation frequency of logic circuits increasing, charge recovery logics in Ref. [1-7] can hardly work correctly. To satisfy the high operation frequency requirement, a charge recovery logic family named boost logic [9-12] is proposed. Boost logics use two-phase non-overlap clock signal as power clock to drive circuits, and operation frequency can reach as high as Gigahertz while the power dissipation can keep in a low level comparing with the conventional static CMOS. For most charge recovery logic, a waiting status exists when the power clock is low, so the operation performance is limited since in half cycle of clock the circuits are inactive. But for the boost logic family, their structure avoids this problem. Fig. 1 illustrates the Fig. 1. Operation principle of boost logic family. Manuscript received Dec. 28, 2011; revised Apr. 26, 2012. The authors are with the Graduate School of Information, Production and Systems, Waseda University, Kitakyushu, Fukuoka, 808-0135, Japan E-mail: yimeng.zhang@fuji.waseda.jp operation principle of boost logic family: gate in boost family consists of two parts working in two non-overlap time intervals; logic values are first evaluated in the logic block, and then amplified to larger voltage values in the boost block. With the two-part structure, boost logic can utilize the time of waiting status to promote the operation boost logic is driven by power clock, boost logic is proper to use in synchronous sequential circuitry. However, in the boost logics, BL, SBL, and EBL require DC power supply. Take the EBL as an example: in EBL circuits, two phase clock signals are required for adjacent EBL gates, also a DC power VCC is also required to drive evaluation blocks. Thus, three power supplies are required which causes high complexity of circuits, and DC power is still dissipated by the evaluation blocks. Low Density Parity Codec (LDPC) is a high code rate codec which is researched widely in recent years [13], and the LDPC chip is broadly used in many wireless communication fields for error correction. In wireless communication application, power dissipation becoming more and more critical because it affects the battery duration directly. Reducing power dissipation of LDPC chip is an important topic in recent research, and the main ideas of low power are focusing on architecture and algorithm [14]. Both methods require improvement of complexity of circuits to realize the low power, and in some conditions, a trade-off between speed and power dissipation has to be considered. An alternative method of lowering power dissipation of LDPC chip is proposed in this paper: not focusing on architecture or algorithm, but using novel low power dissipation circuits techniques. In this paper, a novel charge recovery logic called pseudo-NMOS boost logic (pNBL) [15] is proposed. pNBL has the features of boost logic: the operation can be divided into evaluation stage and boost stage, and the operation frequency can reach as high as Giga-hertz. Moreover, the drivability of pNBL is higher comparing with other boost logics. And comparing with BL, SBL, and EBL, no DC power supply is required. To verify the effect of proposed pNBL in reducing power dissipation of LDPC chip, a critical sequential circuit module called Processing Engine (PE) which occupies large portion of power dissipation is constructed. By demonstrating that the power dissipation of PE can be reduced by using pNBL, the total power of LDPC is also can be reduced as a result. This paper is organized as follows. In Section II, operation of pseudo-NMOS boost logic is discussed, and comparison between pNBL and the previous boost logics (i.e. PBL [12] and EBL [10]) is described. The structure of Processing Engine is discussed in Section III. The simulation results and test chip measurement results are presented in Section IV, and comparison with conventional static CMOS and other charge recovery logics is also given. In Section V, performance of proposed pNBL and PE is summarized and conclusions are given. ## II. PSEUDO-NMOS BOOST LOGIC In this section, structure and operation of pseudo-NMOS and pNBL are first discussed, and then the power dissipation of pNBL is analyzed. #### 1. Pseudo-NMOS Circuits Pseudo-NMOS [16] is a CMOS ratioed logic as shown in Fig. 2. The pull-down network (PDN) is the same with static CMOS gates, while the pull-up network (PUN) is replaced with a single PMOS transistor whose gate is grounded. Since the PMOS is always on, once the PDN is shut off, output is pulled up to $V_{dd}$ . Comparing with the static CMOS circuit, pseudo-NMOS circuits have the advantage in reduction of input capacitance and layout area, because a large number of PMOS transistors in PUN are eliminated. In the other hand, pseudo-NMOS circuits also have disadvantage due to their structures: the ratio of PMOS has to be adjusted carefully to satisfy the pull up ability. When the output is Fig. 2. Structure of pseudo-NMOS [16]. logic '0', both PMOS and PDN are ON, and the current is larger than the static CMOS which causes large static power dissipation. This disadvantage limits the application of pseudo-NMOS circuits. For charge recovery logic, the current is recycled and reused with *LC* tank. Therefore the disadvantage of pseudo-NMOS can be overcome in the charge recovery logic. With this idea, a novel charge recovery logic called pseudo-NMOS boost logic is proposed. # 2. Pseudo-NMOS Boost Logic Fig. 3 shows the structure of pulse boost logic (PBL) gate. PBL uses voltage differences between clk and clk in the non-overlap intervals, and this feature makes the DC power supply no longer necessary. However, PBL requires 4 complementary networks (pull-up and pull-down) to generate the logic values, so a large number of transistors are required. Pseudo-NMOS boost logic (pNBL) illustrated in Fig. 4 is an enhancement of PBL. In pNBL, PUNs in both rails are substituted with a single PMOS transistor, respectively, and gates of the PMOS transistors are connected to clk. Fig. 3. Structure of PBL [12]. Fig. 4. Proposed Structure of pNBL. Comparing with PBL, pNBL uses fewer transistors in the evaluation block, and as a result, the input capacitance of pNBL is smaller than PBL. Since the input capacitance is the load capacitance of other pNBL gates, the power dissipation caused by load capacitance is lower than PBL circuits Operation of pNBL is also divided into evaluation stage and boost stage as other members in boost logic family. In the evaluation stage, clk is in low half cycle (lower voltage) while $\overline{\text{clk}}$ is in high half cycle (higher voltage). When clk is low, PMOS M7 and M8 are always on and act as Pull-Up Network, and the complementary inputs make one PDN on and the other off. This operation procedure is the same with pseudo-NMOS circuits. Because PMOS transistors are used to pull up the logic '1' value, there's no threshold voltage loss in the evaluation stage comparing with PBL. During this stage, since $\overline{\text{clk}}$ is high, pass gates M5 and M6 are both on; voltage values calculated by the evaluation block transfer from the pass gates to boost block. In the evaluation stage, voltage generated by the boost block can barely affect these values. In the boost stage, clk is in high half cycle while $\overline{\text{clk}}$ is in low half cycle. Since $\overline{\text{clk}}$ is low, pass gates M5 and M6 are both off, so the evaluation block cannot affect the boost block. Voltage values generated in the evaluation stage are then latched in the boost block. The voltages from two sides are different; take the $V_{out} > V_{out}$ case as an example: in this case, M1 and M4 are off while M2 and M3 are on, $V_{out}$ ramps up by following clk and $V_{out}$ drops down by following $\overline{\text{clk}}$ . As a result, output voltage is boosted to amplitude value of clk voltage ( $V_{dd}$ for logic '1' and 0 for logic '0'). Fig. 5(a) illustrates the buffer chain of pNBL, and Fig. 5(b) shows the output waveform of pNBL gates buf1 and buf2. When buf1 is in the boost stage, out0 and out0 are in high level and output to buf2 which is in the evaluation stage. As shown in the simulation waveform, signal has a half cycle delay for each gate. For the conventional static CMOS flip-flop in synchronous sequential circuits, the delay time for each gate is at least one cycle, so the delay time of pNBL is reduced to half. Compare with PBL, pNBL improves as follows. First, Fig. 5. pNBL buffer chain. the complex pull-up networks are eliminated and therefore complexity of circuits can be improved with the same number of transistors. Second, the crowbar current in the evaluation block is reduced so that power dissipation of pNBL is smaller than PBL. # 3. Energy Dissipation The power dissipation analysis method of pNBL is similar with that applied on PBL in Ref [12]. To apply this analysis, some assumptions as preconditions are required. For the AC power supply analysis, signals are simplified to sinusoidal format, and the transistors are all working in the linear region. With the simplification, energy dissipation analysis using average current can be applied instead of using integration of instantaneous value of current and voltage. With average current, energy dissipation can be expressed as $E = I_A^2 RT$ , where $I_A$ is the average current, R is equivalent resistance, and T is the period time. Energy analysis is divided into two parts: energy dissipated in the evaluation block and the boost block. In the evaluation block of pNBL, the signals output to boost block are different for logic '1' and '0'. In logic '1' case, when clk is lower than $V_{th}$ and $\overline{\text{clk}}$ is higher than $V_{dd}$ - $V_{th}$ , M7 (or M8) is on, voltage is following with clk; when clk is higher than $V_{th}$ and $\overline{\text{clk}}$ is lower than $V_{dd}$ - $V_{th}$ , M7 (or M8) is turning off, and voltage is kept at $V_{dd}$ - $V_{th}$ . So the swing of signal is between $V_{dd}$ and $V_{dd}$ - $V_{th}$ at the same frequency with clk. While in logic '0' case, PDN is on, so when clk is falling down to 0, voltage is following with clk, but when clk climbs up higher than $V_{th}$ , and clk falls down lower than $V_{dd}$ - $V_{th}$ , the PDN is turning off, and the voltage is kept at $V_{th}$ . So the swing of signal is between 0 and $V_{th}$ at the same frequency with clk. Assuming operation frequency is f, then angular frequency $\omega=2\pi f$ . Load capacitance for the evaluation block is parasitic capacitance $C_{pass}$ of pass gate M5 and M6. Thus, current amplitude for both logic '0' and '1' conditions is $\omega C_{pass} V_{th}$ . Due to the two rail structure of pNBL, possibilities of logic '0' and '1' are the same. Energy dissipation on the evaluation block in one cycle is expressed as: $$E_{eval} = \frac{1}{2} \left(\frac{1}{\sqrt{2}} I_0\right)^2 R_e T + \frac{1}{2} \left(\frac{1}{\sqrt{2}} I_1\right)^2 R_e T$$ $$\approx \frac{2\pi^2 R_e C_{pass}}{T} C_{pass} V_{th}^2$$ (1) where $R_e$ is equivalent resistance of transistors in the evaluation block, and T is time of one cycle. In the boost block of pNBL, analysis is also divided into logic '0' case and logic '1' case. In logic '0' case, because output follows clk in evaluation stage and $\overline{\text{clk}}$ in boost stage, output voltage swings between 0 and $V_{th}$ in both half cycles, and therefore its frequency is twice of operation clock. While in logic '1' case, output signal swings between $V_{dd}$ and $V_{dd}$ - $V_{th}$ at the same frequency with operation clock. Assuming the load capacitance of pNBL gate is $C_L$ , current amplitudes are $2\omega C_L V_{th}$ and $\omega C_L V_{th}$ for logic '0' and '1' cases, respectively. According to the analysis above, energy dissipation of the boost block is expressed as: $$\begin{split} E_{boost} &\approx \frac{1}{2} (\frac{1}{\sqrt{2}} \omega C_L V_{th})^2 R_b T + \frac{1}{2} (\sqrt{2} \omega C_L V_{th})^2 R_b T \\ &= \frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2 \end{split} \tag{2}$$ where $R_b$ is equivalent resistance of the boost block. As a result, the energy dissipation of pNBL is $$\begin{split} E_{pNBL} &= E_{eval} + E_{boost} + E_{crowbar} \\ &\approx \frac{2\pi^2 R_e C_{pass}}{T} C_{pass} V_{th}^2 \\ &+ \frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2 + E_{crowbar} \end{split} \tag{3}$$ In pNBL structure, equivalent resistance $R_b$ and $R_e$ are comparable. Load capacitance $C_L$ is tens of femto farad at Gigahertz level operation frequency while capacitance of pass gate $C_{pass}$ is only several femto farad. With these preconditions, the first item in Eq. (3) is smaller than the second item, which makes Eq. (3) be simplified to $$E_{pNBL} \approx \frac{5\pi^2 R_b C_L}{T} C_L V_{th}^2 + E_{crowbar}$$ (4) Eq. (4) shows that the contribution of the evaluation block is small enough to be negligible, and main power dissipation is caused in the boost block. The crowbar current happens when evaluated value is logic '0'; the crowbar current flows from $\overline{\text{clk}}$ to clk. However, in the Pseudo-NMOS structure, the pull-up PMOS M7 and M8 have very large on-resistance to make sure PDN has larger drivability than pull up PMOS. As a result, the crowbar current is very small due to the large on-resistance. Furthermore, the input signals rise and fall with power clock clk and $\overline{\text{clk}}$ , therefore, even the input signals waveform has a relatively large rising and falling time, the voltage difference between input signals and power clock is approximately the same during the evaluation stage, and this limits the crowbar current. Comparing with energy dissipation of PBL in Eq. (5), energy dissipation of pNBL is obviously smaller. $$E_{PBL} \approx \frac{7\pi^{2}R_{b}C_{L}}{T}C_{L}V_{th}^{2} + E_{crowbar}$$ $$E_{PBL} \approx 0.045\alpha C_{L}V_{DD}^{2} + \frac{0.45\pi^{2}RC_{L}}{T}C_{L}V_{DD}^{2} + E_{crowbar}$$ (6) Eq. (6) shows the energy dissipation of EBL [10]. The energy dissipation is also divided into 3 parts: evaluation block, boost block and the crowbar current part. Since the structures of boost blocks are the same in both pNBL and EBL, $E_{boost}$ is the same. The energy comparison is made in the evaluation block and energy dissipated by crowbar current. For the evaluation block part, evaluation blocks of pNBL only drive pass gate M5 and M6, therefore the load capacitance $C_{pass}$ (several femto-Farad) is smaller than load capacitance of the whole pNBL gate $C_L$ (tens of femto-Farad). According to Ref [10], $V_{th}$ is about $0.3V_{DD}$ . Assuming activity factor as 1/2, even when the circuit works at frequency of Gigahertz, energy dissipated in pNBL's evaluation block is only 1/1000 of that in EBL's evaluation block by calculation. For the crowbar current part, because power supply for the evaluation block of EBL is DC, and the rising and falling edges of the input waveform are relatively slow, the crowbar current of is large. In pNBL, because the input signals rise or fall with power clock, the crowbar current caused by slow rising and falling is not a problem. As a result the energy dissipated due to crowbar current of pNBL is also smaller than that of EBL. ### III. DESIGN OF PROCESSING ENGINE #### 1. Introduction on PE Processing Engine (PE) is a central processing unit of the architecture of LDPC decoding circuit. In LDPC decoder, a parity check matrix (PCM) is required, and decoding algorithm called message passing algorithm (MPA) is employed. In MPA, log likelihood ratio (LLR) is used to combine messages. PEs are used to perform calculation of LLR. Fig. 6 shows a simplified block diagram of PE. The target of PE is to find the minimum value of the input LLR information and subtract the offset factor, and use this offset value to update the check message. Inputs of Fig. 6. Block diagram of processing engine. PE are two 5-bit signed number (1 sign bit and 4 bit number), and by performing the MPA algorithm, the outputs of PE are also two 5-bit signed number. In the PE module, circuits including adder, FIFO, multiplexer and comparator are representative for digital circuits. ## 2. Design PE with pNBL Design of PE utilizes Top-Down design methodology. Firstly, whole structure and specification of system are confirmed, then the whole system is broken to small modules, and what circuits are required in each module is clarified. After all circuits are designed to meet the request, circuits are formed to modules. Modules have to be simulated to verify whether the combination of circuits meets specification. In the final step, all modules are combined to the system level, and then system level simulation is performed to verify the design. In the PE illustrated in Fig. 6, modules of adders, comparator and absolute value calculator are required. To reduce latency of circuits, complexity of pNBL gates should be improved so that one pNBL gate can realize as many functions as possible. Schematic of a 5-bit pNBL comparator is shown in Fig. 7 which exhibits the capability of pNBL for high complexity function. In the comparator, when input a[4:0]≥b[4:0], output is logic '1', and otherwise output is logic '0'. To realize this function, stack height of evaluation block is set to 5, and the simulation result indicates that the circuits can work at the operation frequency as high as 2.2 GHz with amplitude 1.2 V of power clock. In order to combine these circuits together into the Fig. 7. 5-bit comparator with pNBL. system, layouts of all circuits are designed with the same power rail, by which way complexity of layout design is much reduced. To apply the proposed pNBL gate with standard CMOS technology, bulk gates of all NMOS transistors and PMOS transistors are connected with Ground and 1.2 V, respectively. Interface circuits between the conventional static CMOS and proposed pNBL are shown in Fig. 8. With these interfaces, pNBL circuits can be compatible with the conventional static CMOS. Interface between static CMOS is a pNBL buffer, with which input DC signals can be transferred to sinwave-like signals, and setup/hold time of generated signals can also satisfy the requirement of pNBL. Interface between pNBL and static CMOS is the same with that described in Ref [12], which can be recognized as a 1-bit A/D converter. This data A/D converter samples data when output is in the peak values, and therefore delay caused by it is small comparing with cycle time of power clock. In order to combine these circuits together into the system, layouts of all circuits are designed with the same power rail, by which way complexity of layout design is much reduced. To apply the proposed pNBL gate with standard CMOS technology, bulk gates of all NMOS transistors and PMOS transistors are connected with Fig. 8. Interface circuits between pNBL and static CMOS. Ground and 1.2 V, respectively. Interface circuits between the conventional static CMOS and proposed pNBL are shown in Fig. 8. With these interfaces, pNBL circuits can be compatible with the conventional static CMOS. Interface between static CMOS is a pNBL buffer, with which input DC signals can be transferred to sinwave-like signals, and setup/hold time of generated signals can also satisfy the requirement of pNBL. Interface between pNBL and static CMOS is the same with that described in Ref [12], which can be recognized as a 1-bit A/D converter. This data A/D converter samples data when output is in the peak values, and therefore delay caused by it is small comparing with cycle time of power clock. To generate the power clock, LC tank is used to recycle energy from circuits because inductor can store energy in format of magnet energy, and then energy is reused in the next cycle in the format of electrical power. Since 2-phase non-overlap power clock is necessary to drive pNBL, circuit called blip [17] shown in Fig. 9 is designed, the W/L ratio of NMOS pair is set to 200 $\mu$ m/1 $\mu$ m, and the R in the figure is DCR of on-chip inductor. The power supply is marked as $V_{CC}$ in Fig. 9, while in other figures DC power supply for circuits is marked as $V_{dd}$ because the voltage values are different. Power supply to the clock generator VCC is 0.8 V DC, amplitude of generated power clock is 2×0.8=1.6 V, which is determined by the W/L ratio of NMOS pair and current of inductor. To verify that the proposed system can work at a high operation frequency with high energy efficiency, center-tapped on-chip inductor model is used in the simulation. By using an on-chip inductor, all system can be integrated on a chip. Although blip power clock generator has advantage on integration and low power [17], due to its principal, there is disadvantage. *Q* Fig. 9. Blip power clock generator. factor of on-chip inductor is low, and therefore the power efficiency is limited. The influence of mutual inductance is also a demerit point. For the central-tapped inductance, mutual inductance exists and degrades the energy efficiency, but in this design, the inductor is fully symmetric, so the influence of mutual inductance for both ports of the inductor is the same, and as a result, the mutual inductance doesn't affect the operation of clock generator much. Moreover, though the power dissipation of the clock generator is lower than the conventional clock tree circuits, the area penalty of the clock generator is large due to employing the on-chip inductor. In a large scale circuit, a clock distribution network has to be designed carefully since clock skew and jitter would affect the circuit performance. In the conventional CMOS circuit, a buffer tree is used as a distribution network, but in pNBL circuits, this method cannot be adapted. Because when a power clock signal is gated by a DC power supply buffer, the energy can no longer be recycled and reused. The problem of clock jitter and skew is researched in several literatures [18, 19]. In this work, scale of the proposed PE circuit is small enough to neglect the effect of clock skew and jitter. Block diagram of whole system is shown in Fig. 10, which includes power clock generator, PE with pNBL and interface between pNBL and static CMOS. # IV. TEST CHIP OF PROPOSED PE ## 1. Simulation and Evaluation To demonstrate the improvement on low power, a conventional static CMOS PE with the same architecture is designed using the same $0.18~\mu m$ process technology which is used in the pNBL PE design. Since the pNBL is compatible with static CMOS, the functional verification method is the same with that of conventional PE circuits: input the data pattern and verify the output data. The simulation shows that the proposed pNBL PE can work correctly at the frequency of 1.5 GHz. In this simulation, on-chip inductor is used, so the inductance of blip generator can hardly be changed, and therefore, to achieve various operation frequencies, capacitance of LC tank is changed. The on-chip inductor is created with top two layers of metal, which are with a Fig. 10. Block diagram of PE system. small value of thickness, and the diameter size of inductor is limited, so the *Q* value of on-chip inductor is not so large (about 10 according to the HFSS simulation). With the HFSS simulation result, an on-chip inductor model whose inductance is 3 nH was designed. The capacitor values are set to 6 pF, 10 pF, 14 pF, 20 pF, 35 pF, and 40 pF, and the achieved operation frequencies are in the range between 403 MHz and 1.1 GHz. The simulation result shows that the energy dissipation increases as the operation frequency increases which is indicated in Eq. (4). To make the comparison of energy dissipation with the conventional static CMOS circuit, a same architecture PE with conventional static CMOS is designed. A post-layout simulation is performed to verify the energy dissipation changing with operation frequency. Fig. 11 shows the comparison of energy dissipation between PE with static CMOS and PE with pNBL gates. As explained previously, energy dissipation of pNBL gate increases linearly with the increase of operation frequency. In the static CMOS case, energy dissipation per cycle is expressed as $$E_{CMOS} = \frac{1}{2} C_L V_{DD}^2 \tag{7}$$ where $C_L$ is the load capacitance and $V_{DD}$ is power supply. To achieve a better performance over energy dissipation, in low operation frequency range, a lower $V_{DD}$ is used, Fig. 11. Energy dissipation comparison between static CMOS and pNBL. and in the high operation frequency range, a higher $V_{DD}$ is required to make sure the circuits works correctly. In Fig. 11, when the operation frequency is higher than 900 MHz, a $V_{DD}$ =1.8 V is required, and the energy dissipation almost doesn't change by varying the operation frequency. When operation frequency is in the low range (403 MHz), energy dissipation of pNBL gate PE is 36% of that dissipated by static CMOS PE. When operation frequency is higher than 1.1 GHz, energy dissipation of static CMOS PE and pNBL gate PE is in the same level. To improve the energy efficiency in the high frequency range, the clock generator should be more improved | Items | This Work | EBL [10] | PBL [12] | |-------------------------------------------------------|------------------------|----------------------------------|------------------------| | Technology | 0.18 μm | 0.18 μm | 0.18 μm | | Power supply | 1.2 V, 2-phase,sinwave | 1.2 V, 2-phase, sinwave 0.4V, DC | 1.2 V, 2-phase,sinwave | | Simulation frequency range (400 MHz to Max frequency) | 400 MHz to 1.5 GHz | 400 MHz to 1.2 GHz | 400 MHz to 1.5 GHz | | Energy Dissipation @400 MHz | 1.0 pJ/cycle | 1.4 pJ/cycle | 1.2 pJ/cycle | **Table 1.** Performance Comparison Table Fig. 12. Microphotograph of test chip. using a higher O on-chip inductor. For LDPC application, the operation frequency range is about several hundred megahertz level, so according Fig. 11, pNBL gate PE can achieve a lower energy dissipation comparing with static CMOS gate in this range. Table 1 compares performance of pNBL with EBL and PBL. Same structure PE's are designed with EBL and PBL for comparison. According to the comparison result, proposed pNBL achieves lower power dissipation than EBL and PBL. ## 2. Test Chip Measurement The proposed PE module with pNBL was fabricated with 0.18 $\mu$ m CMOS process technology. Fig. 12 shows the microphotograph of the test chip including power clock generator and PE module. Area of Processing Engine was 230×161 $\mu$ m², the whole system including power clock generator was 558×240 $\mu$ m². The central-tapped on-chip inductor used in the power clock generator was 3 nH, and off-chip capacitors 40 pF, 37 pF, 35 pF, 31 pF, 27 pF, 25 pF, and 20 pF were used to adjust the resonance frequency of power clock generator, and achieved operation frequencies were 404 MHz, 420 MHz, 440 MHz, 476 MHz, 507 MHz, 549 MHz, and 609 MHz, respectively. Firstly the function of Processing Engine was tested; the result showed that the test chip could work correctly in all achieved frequency. And then the energy dissipation was measured; in the measurement, the test chip dissipated 1 pJ/cycle at low frequency 404 MHz and 2.1 pJ/cycle at high range frequency 609 MHz. To compare with measurement result, simulation in the range between 400 MHz and 600 MHz was also run. Fig. 13 shows the measurement result comparing with simulation result with the varying of operation frequency. Trend of measured result is the same with simulation result, but measured energy dissipation is a little larger Fig. 13. Energy dissipation comparison between measurement and simulation. | Items | Value | | |---------------------------------------------------------|-----------------------------------|--| | Operation frequency range | 404 MHz to 609 MHz | | | Technology | 0.18 μm CMOS process | | | Total Transistors NO. (including static CMOS interface) | PMOS:563<br>NMOS:1083 | | | Power supply for oscillator | DC 0.8V | | | Power supply for data A/D converter | DC 1.2V | | | Energy dissipation per cycle in simulation | 1 pJ@403 MHz<br>3.5 pJ@1.1 GHz | | | Power dissipation in simulation | 403 μW@403 MHz<br>3.85 mW@1.1 GHz | | | Energy dissipation per cycle in measurement | 1.1 pJ@ 404 MHz<br>2.1 pJ@609 MHz | | | Power dissipation in measurement | 444 μW@404 MHz<br>1.28 mW@609 Hz | | | Test chin area | 558×240 µm <sup>2</sup> | | Table 2. Performance summary of pNBL gate PE than simulated result. This is because elements in measurement such as wire resistance are not included in simulation. According to the measurement result, load capacitance $C_L$ and equivalent resistance $R_e$ can be derived. By estimating $C_L$ 's value to about 1 pF, $R_e$ is calculated to about 3 K $\Omega$ , which are reasonable values. Table 2 summarizes performance of the test chip. ### V. CONCLUSIONS In this paper a novel structure of charge recovery logic called pseudo-NMOS boost logic (pNBL) is proposed. Comparing with other charge recovery logic, the proposed pNBL has the advantage of high speed and low power dissipation with fewer transistors. A Processing Engine which is used in LDPC decode system is designed with this charge recovery logic and implemented with standard 0.18 µm CMOS process technology. The simulation results show that proposed PE can work correctly at the operation frequency of 1.5 GHz, and when operation frequency is lower than 1.1 GHz PE with pNBL gates achieves lower power dissipation than PE with conventional static CMOS gates. At the frequency range of several hundred megahertz which LDPC application is usually applied, energy dissipation of PE with pNBL gates is reduced much. The proposed PE dissipates 3.5 pJ per cycle at 1.1 GHz, and 1 pJ at 403 MHz in simulation. The latter one is only 36% of PE with static CMOS gates. Comparing with other charge recovery logic, pNBL also has a better performance over energy dissipation. The test chip was fabricated and measured, the result showed that the test chip can work at frequency up to 609 MHZ with the energy dissipation of 2.1 pJ/cycle including PE module and blip power clock generator. ## **ACKNOWLEDGMENTS** This research was supported by Waseda University Global COE Program "International Research and Education Center for Ambient SoC" sponsored by Ministry of Education, Culture, Sports, Science and Technology, Japan. Authors would like to thank VLSI Design and Education Center (VDEC), the University of Tokyo for the fabrication of test chip. ## REFERENCES - [1] Y. Moon and D. K. Jeong, "Efficient charge recovery logic," VLSI Circuits, 1995. Digest of Technical Papers., 1995 Symposium on, pp. 129–130, Jun. 1995. - [2] D. Suvakovic and C. Salama, "Two phase non-overlapping clock adiabatic differential cascode voltage switch logic (adevsl)," Solid-State Circuits Conference, 2000. Digest of Technical Papers. ISSCC. 2000 IEEE International, pp.364–365, 2000. - [3] H. Jianping, C. Lizhang, and L. Xiao, "A new type of low-power adiabatic circuit with complementary pass-transistor logic," ASIC, 2003. Proceedings. 5th International Conference on, vol. 2, pp. 1235– 1238 Vol. 2, Oct. 2003. - [4] Y. Ye and K. Roy, "Qserl: quasi-static energy recovery logic," *Solid-State Circuits, IEEE Journal of*, vol. 36, no. 2, pp. 239–248, Feb. 2001. - [5] V. De and J. Meindl, "Complementary adiabatic and fully adiabatic mos logic families for gigascale integration," Solid-State Circuits Conference, 1996. Digest of Technical Papers. 42nd ISSCC., 1996 IEEE International, pp. 298–299, 461, Feb. 1996. - [6] Y. Takahashi, K. Konta, K. Takahashi, M. Yokoyama, K. Shouno, and M. Mizunuma, "Carry propagation free adder/subtracter using adiabatic dynamic emos logic circuit technology," Fundamentals of Electronics, Communications and Computer - Sciences, IEICE Transactions on, vol.E86-A, no.6, pp. 1437–1444, Jun 2003. - [7] Y. Takahashi, T. Sekine, and M. Yokoyama, "VLSI implementation of a 4x4-bit multiplier in a two phase drive adiabatic dynamic CMOS logic," *Electronics, IEICE Transactions on*, vol. E90-C, no. 10, pp. 2002–2006, Oct 2007. - [8] W. Athas, L. Svensson, J. Koller, N. Tzartzanis, and E. Ying-Chin Chou, "Low-power digital systems based on adiabaticswitching principles," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. 2, no. 4, pp. 398–407, Dec. 1994. - [9] V. S. Sathe, J. Y. Chueh, and M. C. Papaefthymiou, "Energy-efficient ghz-class charge-recovery logic," *Solid-State Circuits, IEEE Journal of*, vol. 42, no. 1, pp. 38–47, Jan. 2007. - [10] J. C. Kao, W. H. Ma, V. S. Sathe, and M. Papaefthymiou, "Energyefficient low-latency 600 mhz fir with high-overdrive chargerecovery logic," Very Large Scale Integration (VLSI) Systems, IEEE Transactions on, vol. PP, no. 99, pp. 1–12, 2011. - [11] W. H. Ma, J. C. Kao, V. S. Sathe, and M. Papaefthymiou, "A 187MHz subthreshold-supply robust fir filter with charge-recovery logic," *VLSI Circuits, 2009 Symposium on*, pp. 202–203, June 2009. - [12] Y. Zhang, L. Okamura, and T. Yoshihara, "An energy efficiency 4-bit multiplier with two-phase non-overlap clock driven charge recovery logic," *Electronics, IEICE Transactions on*, vol. E94-C, no. 4, pp. 605–612, April 2011. - [13] L. Chen, J. Xu, I. Djurdjevic, and S. Lin, "Near-shannon-limit quasicyclic low-density parity-check codes," *Communications, IEEE Transactions on*, vol. 52, no. 7, pp. 1038–1042, july 2004. - [14] A. Darabiha, A. Chan Carusone, and F. Kschischang, "Power reduction techniques for LDPC decoders," *Solid-State Circuits, IEEE Journal of*, vol. 43, no. 8, pp. 1835–1845, aug. 2008. - [15] Y. Zhang, M. Huang, N. Wang, S. Goto, and T. Yoshihara, "A 1pJ/cycle processing engine in LDPC application with charge recovery logic," Solid-State Circuits Conference, 2011. A-SSCC 2011. IEEE Asian, pp. 213–216, Nov 2011. - [16] J. P. Uyemura, *Introduction to VLSI Circuits and Systems*, John Wiley & Sons, INC., 2002. - [17] W. Athas, L. Svensson, and N. Tzartzanis, "A resonant signal driver for two-phase, almost-non-overlapping clocks," *Circuits and Systems, 1996.*ISCAS '96., 'Connecting the World'., 1996 IEEE International Symposium on, vol.4, pp. 129–132, 12-15 May, 1996. - [18] A. Drake, K. Nowka, T. Nguyen, J. Burns, and R. Brown, "Resonant clocking using distributed parasitic capacitance," *Solid-State Circuits, IEEE Journal of*, vol. 39, no. 9, pp. 1520–1528, sept. 2004. - [19] B. Mesgarzadeh, M. Hansson, and A. Alvandpour, "Jitter characteristic in charge recovery resonant clock distribution," *Solid-State Circuits, IEEE Journal of*, vol. 42, no. 7, pp. 1618–1625, july 2007. Yimeng Zhang received the B.S. degree in Electronic Science and Technology from Tsinghua University in 2005, and M.E. degree in Information, Production and Systems from Waseda University in 2007. He is currently a Ph.D. candidate in Graduate School of Information, Production and Systems, Waseda University Japan. His research interests include low energy dissipation circuits structure, especially AC power supply circuits. Mengshu Huang received B.E. on Communication Engineering from Fudan University, Shanghai, China in 2007, and M.E. from Graduate school of Information, Production and System, Waseda University, Fukuoka, Japan in 2009. He is now a Ph.D. candidate in the same graduate school, and is currently involved in the research on the on-chip power supply technology, especially on the charge pump systems. Nan Wang received the B.S. degree in Electronic Engineering from Fudan University in 2009 and the M.E. degree in Graduate School of Information, Production and Systems, Waseda University in 2011. His research interests include low energy dissipation circuits structure and charge recovery logic. Satoshi Goto was born on January 3rd, 1945 in Hiroshima, Japan. He received the B.E. degree and the M.E. degree in Electronics and Communication Engineering from Waseda University in 1968 and 1970, respectively. He also received the Dr. of Engineering from the same University in 1981. He is IEEE fellow, Member of Academy Engineering Society of Japan and professor of Waseda University. His research interests include LSI system and Multimedia System. **Tsutomu Yoshihara** received the B.S. and M.S. degrees in physics and the Ph.D. Degree in electronic engineering from Osaka University, Osaka, Japan, in 1969, 1971, and 1983, respectively. In 1971, He joined ULSI Laboratory of Mitsubishi Electric Corporation, Hyogo, Japan, where he has been engaged in the research and development of MOS LSI memories. Since 2003, he has been a professor in the Graduate School of Information, Production and Systems, Waseda University, Fukuoka, Japan, and is currently involved in research on system LSI. He is a member of IEEE Solid-State Circuits, the Institute of Electronics, Information and Communication Engineering (IEICE) of Japan and the Institute of Electrical Engineers of Japan.