# STT-MRAM Read-circuit with Improved Offset Cancellation

Dong-Gi Lee and Sang-Gyu Park

Abstract—We present a STT-MRAM read-circuit which mitigates the performance degradation caused by offsets from device mismatches. In the circuit, a single current source supplies read-current to both the data and the reference cells sequentially eliminating potential mismatches. Furthermore, an offset-free pre-amplification using a capacitor storing the mismatch information is employed to lessen the effect of the comparator offset. The proposed circuit was implemented using a 130-nm CMOS technology and Monte Carlo simulations of the circuit demonstrate its effectiveness in suppressing the effect of device mismatch.

*Index Terms*—STT-MRAM, read-circuit, offset cancellation, sensing margin

# I. INTRODUCTION

Spin-transfer torque magnetic random access memory (STT-MRAM) is receiving much attention because it has the advantage of both DRAM, which has high speed and low power consumption, and Flash which has non-volatility and high integration density [1-6]. A typical STT-MRAM memory cell consists of a selection transistor and a magnetic tunnel junction (MTJ), and an MTJ consists of two ferromagnetic layers separated by a

Manuscript received May. 11, 2016; accepted May. 1, 2017

Dept. of Electronics and Computer Engineering, Hanyang University, 222 Wangsimni-ro, 133-791, Seoul, Korea

thin oxide layer. The resistance of an MTJ is low ( $R_L$ ) when the magnetization directions of the ferromagnetic layers are in parallel, and it is high ( $R_H$ ) when they are anti-parallel. The magnetization direction of one of the ferromagnetic layers is fixed (fixed layer) while that of the other layer can be switched by applying a large current pulse through it (free layer). The direction of the free layer and thus the resistance of the MTJ cell.

In a typical STT-MRAM read-circuit, identical currents are supplied to a data MTJ cell and a reference cell, and the difference between the voltages produced by the cells is measured by a sense amplifier (or a comparator). If the MTJ is in the high (or low) resistance state, then it would produce higher (or lower) voltage. The reference cells are designed to have a resistance between  $R_H$  and  $R_L$  so that they produce a voltage between those by MTJ cells in the high and the low resistance states.

One of the challenges confronted by designers of STT-MRAM read-circuits is that the contrast between  $R_H$  and  $R_L$  is not large in today's MTJ technologies. The contrast is usually represented by Tunnel magneto-resistance (TMR), which is defined as TMR  $\equiv (R_H - R_L)/R_L$  [2], and is usually lower than 100% for practical devices. Therefore, the output voltage difference between the two resistance states is small.

On the other hand, STT-MRAM read-circuits have input-referred offset voltages produced by mismatches between the devices. Because the signal voltage is very small due to small TMRs of MTJs, even a small offset degrades the memory performance seriously. Therefore, a scheme which can cancel the effect of the mismatches is desired, and many studies have been reported [7-10].

E-mail : sanggyu@hanyang.ac.kr

This work was supported by the future semiconductor device technology development program (10044608) funded by MOTIE (ministry of trade, industry & energy) and KSRC (Korea semiconductor research consortium).



Fig. 1. Schematic of the conventional read circuit [11].

The offset-canceling triple-stage (OCTS) read circuit proposed in [7, 8], and the time-differential sensing circuit (TDSC) proposed in [9] employ multi-phase operations and sacrifice the sensing speed in exchange for a reduction of the offset in the sensing circuit. The latch-offset-cancellation sense amplifier (LOC-SA) proposed in [10] employed a offset-free pre-amplification of the small voltage difference so that an offset of the comparator itself can be ignored.

In this work, we propose a new read-circuit for STT-MRAM, in which the offset caused by mismatches between the current paths for the data cell and the reference cell can be completely removed. Furthermore, the signal voltage, which is the voltage difference between the data cell and the reference cell, is preamplified by a positive feedback loop without any offset before it is fed into a latch. Therefore, when the latch finally determines the state of the memory cell, its offset does not degrade the read-circuit performance.

This paper is organized as follows. Sec. II introduces offset cancelling circuits proposed in prior publications. Sec. III presents the proposed circuit and describes its operation. Sec. IV presents the results of the SPICE-level simulations of the circuit and compares the results with that of the existing offset cancelling circuits. Finally, Sec. V concludes this paper.

#### **II. PREVIOUS CIRCUITS FOR STT-MRAM**

Fig. 1 shows a conventional STT-MRAM read-circuit [11]. As mentioned in the Introduction, identical currents are applied to the data and the reference cells to produce voltages according to the resistance of the cells. Then, they are compared by a sense amplifier to produce the



Fig. 2. Schematic of OCTS read circuit [7, 8].

digital output. The circuit of Fig. 1 is vulnerable to any offset caused by mismatches in parameters such as threshold voltage and device size. For example, if the current source  $M_7$  produces a larger current than  $M_8$  does, then even when the resistance of the Data cell is lower than that of the Reference, the Data cell voltage can be higher than the Reference cell voltage leading to an error.

Fig. 2 shows the OCTS read circuit proposed in [7, 8]. In an OCTS circuit, a single current source is used to generate both the data voltage and the reference voltage to eliminate the current source mismatch. In phase 1, the read current from M<sub>6</sub> is applied to the reference cell, and the produced reference voltage is stored in capacitor  $C_0$ . In phase 2, the current is applied to a data cell, and the produced data-cell voltage is stored in C1. Finally, in phase 3, the reference voltage stored in  $C_0$  and the data voltage stored on C1 are compared by a sense amplifier to produce the digital output. This circuit prevents the reduction of sensing margin caused by mismatches between the current sources for the data cell and the reference cell. However, because it uses two clock phases to sample the data and the reference voltages, its operation is slower than the conventional circuit. More importantly, in the circuit, the input-referred offset of the sense amplifier is not removed at all, of which the effect can be significant given the small TMR of MTJs.

Fig. 3 shows the LOC-SA proposed in [10]. This circuit consists of current sources ( $M_3 \sim M_6$ ) and a latch which determines the state of memory cells ( $M_1 \sim M_4$ ). Note that the current source transistors also serve as parts of the latch. In phase 1, the data and the reference voltages are equalized by shorting  $V_{out}$  and  $\overline{V_{out}}$  nodes while supplying currents from the current sources. In phase 2,  $V_{out}$  and  $\overline{V_{out}}$  nodes are disconnected and the



Fig. 3. Schematic of LOC-SA [10].

voltage difference  $\Delta V = V_{out} - \overline{V_{out}}$  is developed. In phase 3, a positive feedback loop is formed and the  $\Delta V$  from phase 2 is amplified. Finally, in phase 4, the latch formed by  $M_1 \sim M_4$  is activated and produces a digital output.

One of the main points of the LOC-SA read-circuit is that the pre-amplification by the positive feedback loop in phase 3 does not suffer from the device mismatch. However, the LOC-SA circuit still suffers from the device mismatches in the signal development stage of phase 2. If the mismatch is large enough, then the sign of  $\Delta V$  can be reversed producing an error. The source degeneration (M<sub>5</sub> and M<sub>6</sub>) employed in [12] can reduce the mismatches between the current sources but cannot prevent an offset completely.

#### III. PROPOSED CIRCUIT DESCRIPTION

Fig. 4 shows a schematic diagram of the proposed read-circuit, which combines the benefits of OCTS circuit and LOC-SA using a novel structure. Here, identical currents are supplied to the data cell and the reference cell using the same transistor as the current source. The difference between the generated voltages are pre-amplified by a positive feedback structure and then sensed by a latch. The capacitor  $C_0$  stores the offset voltage caused by mismatches so that the pre-amplification and the comparison circuits do not suffer from an offset.

Fig. 5(a)-(d) shows simplified schematics of the proposed circuit in each clock phase. Fig. 5(a) shows the circuit in phase 1.  $M_3$  and  $M_4$  supply nominally identical currents to Ref<sub>0</sub> and Ref<sub>1</sub>. Therefore, ideally the voltages



Fig. 4. Schematic of the proposed read-circuit.

at the drains of  $M_5$  and  $M_6$  should be identical. However, in practice, there are mismatches between the devices in the two current paths leading to a voltage difference developed between them. This offset voltage is stored in capacitor  $C_0$ .

Fig. 5(b) shows the circuit in phase 2. After the righthand side of  $C_0$  is disconnected from the gates of  $M_3$  and  $M_4$ , Ref<sub>0</sub> is replaced by the Data cell. The current sources  $M_3$  and  $M_4$  supply the same amount of currents as those in phase 1. Therefore, if the resistance of the Data cell is lower than that of the Ref<sub>0</sub>, then  $V_{out}$  will go down and vice versa. It should be noted that if the Data cell resistance is the same as the resistance of Ref<sub>0</sub>, then even with device mismatches in phase 1, the voltage change  $\Delta V$  will be zero. It is also noted that Ref<sub>1</sub> is used just to setup the operation condition, and the real signal voltage  $\Delta V$  depends only on the resistance of the Data cell and the Ref<sub>0</sub> cell.

Fig. 5(c) shows the circuit in phase 3. Here, the diodeconnection of  $M_4$  is broken and the gate of  $M_4$  is connected to the right-hand side plate of  $C_0$  to establish a positive feedback loop. The loop functions as a preamplifier which increases the voltage difference between  $V_{out}$  and  $\overline{V_{out}}$  nodes.

Fig. 5(d) shows the circuit in phase 4, in which the latch formed by  $M_1 \sim M_4$  produces the digital output. Our offset cancellation scheme cannot prevent the latch from having an offset voltage. However, the input to the latch is usually much larger than the latch offset thanks to the



Fig. 5. Proposed read-circuit in (a) phase 1, (b) phase 2, (c) phase 3, (d) phase 4.

Table 1. MTJ Parameters

| Parameter | Description                  | Default Value            |  |
|-----------|------------------------------|--------------------------|--|
| TMR(0)    | TMR with 0 V <sub>bias</sub> | 100 %                    |  |
| RA        | Resistance area product      | $5 \Omega \cdot \mu m^2$ |  |
| tox       | Height of oxide barrier      | 0.85 nm                  |  |
| Area      | Surface of MTJ               | 40 nm x 40 nm x π/4      |  |
| V         | Volume of free layer         | Area x 1.3 nm            |  |

pre-amplification in phase 3. Therefore, an offset of the latch does not affect the error performance of the readcircuit significantly, if ever. This is in contrast to the circuit of Fig. 1, where a small  $\Delta V$  is directly applied to the sense amplifier without an amplification and any small offset of the sense amplifier affects the read performance seriously.

# **IV. SIMULATION RESULTS AND DISCUSSION**

We implemented the proposed STT-MRAM readcircuit using a 130 nm CMOS process with a 1.5 V of supply voltage and verified the performance of the circuit using SPICE-level simulations using Spectre®. The parameters of MTJ used in the simulations are summarized in Table 1. The nominal MTJ resistances calculated from the parameters in Table 1 were  $R_L \approx 4k\Omega$ , and  $R_H \approx 8k\Omega$  for TMR = 100 %. For TMRs other than 100%,  $R_H$  was varied while  $R_L$  was fixed. The read current was set at 36  $\mu$ A.

The area of the designed sense-amplifier was  $120 \ \mu m^2$ . This includes the area of  $25 \ \mu m^2$  occupied by a capacitor of 11.6 fF, which was the minimum size allowed by the CMOS technology used in this work. Fig. 6 shows the results of the post-layout transient simulations of the read operations, which include the effect of the parasitic components from post-layout parameter extractions. Note that the parasitics related to MTJs were not included in the post-layout simulations.



**Fig. 6.** Simulated waveforms from the proposed read-circuit (a) 4-phase clock, (b) consecutive reading of HIGH and LOW data cells without any offset, (c) consecutive reading of HIGH and LOW data cells with an offset ( $V_{TH4}=V_{TH3}+80$  mV).

Fig. 6(a) shows waveforms of the 4-phase clock used in the simulations and Fig. 6(b) shows the output voltage waveforms from two consecutive read operations, one reading a HIGH cell and the other reading a LOW cell. We can observe the storing of the offset-voltage in phase 1, the signal voltage development in phase 2, the preamplification in phase 3, and the latch operation in phase 4 clearly.

The simulations for Fig. 6(b) were performed assuming a perfect device-matching (i.e., zero offset voltage). To verify the effectiveness of the proposed circuit in eliminating the effect of mismatches, we performed simulations simulating a large mismatch between the threshold voltages of current source transistors ( $M_3$  and  $M_4$  in Fig. 4) by adding a dc voltage source of 80 mV to the gate of  $M_4$ . Fig. 6(c) shows the results of the simulations. We can observe that the mismatch in the threshold voltage leads to the development of voltage difference between the cell voltages (Ref<sub>0</sub> and Ref<sub>1</sub>) in phase 1, which is stored in C<sub>0</sub> and subsequently used in the pre-amplification of phase 3. We can observe that a correct result was obtained from either reading HIGH or LOW state.

In the simulations of Fig. 6(c), only the mismatches between the threshold voltages of  $M_3$  and  $M_4$  in Fig. 4 were considered. To verify the capability of the proposed circuit in canceling the offsets from various mismatches, we performed Monte-Carlo simulations, which account for the variations of the MTJ resistance as well as MOSFET variations. For the MTJ resistance distribution, a uniform distribution with the maximum deviation of ±14% was used. For the MOSFET parameter variations, a manufacture-supplied library for the 130nm CMOS process was employed.

Fig. 7 compares the error rates of the proposed STT-MRAM read-circuit with those of the existing circuits. For a fair comparison, similar device sizes and read current were used for all read-circuits. For an accurate estimation of the error rates, we repeated simulations until we count at least 50 errors for each case. The only exception was the case of the proposed circuit with TMR = 150%, for which we did not observe a single error even after three hundred and thirty thousand (330,000) iterations.

In Fig. 7, the square symbols represent the error rates from the conventional read-circuit of Fig. 1. In the



Fig. 7. Error rate obtained from Monte Carlo simulations.

simulations for the conventional circuit, we used an ideal sense amplifier without any offset of its own. We observe very high error rates even with this ideal sense amplifier. Even at TMR = 150%, the error rate exceeds 0.01 (1%).

The circular symbols in Fig. 7 represent the error rates from LOC-SA circuit of Fig. 3. The main advantage of this circuit comes from the offset free pre-amplification which enables us to ignore the offset of the final latch. In this respect, the error rate of this circuit should be identical to that of the conventional circuit with an ideal sense amplifier. In Fig. 7, we observe that the error rate of LOC-SA is somewhat lower than that of the conventional circuit. This reduction of the error rate comes from the use of source-degenerated current source employed in LOC-SA. We verified that without the source-degenerated current source almost identical results were obtained from the conventional read-circuit and LOC-SA (not shown).

Finally, the triangles in Fig. 7 represent the error rates from the proposed circuit. We can observe that the error rates from the proposed circuit is much lower than that of LOC-SA. This illustrates the effectiveness of the proposed circuit in offset-canceling. This is enabled by the combination of the offset-free pre-amplification with the current sourcing scheme which uses an identical transistor to supply the current for the data cell and the reference cell. The remaining errors are mainly caused by mismatches between bit-line and word-line selection transistors ( $M_7$ ,  $M_{10}$  or  $M_8$ ,  $M_{11}$  in Fig. 4) as well as the mismatches between the MTJs itself. Since they are integral parts of the data (or reference) cell, it is impossible to remove the their effect as long as we

|                              | [8]                   | [10]                    | [12]  | This Work                                     |
|------------------------------|-----------------------|-------------------------|-------|-----------------------------------------------|
| CMOS<br>technology           | 45 nm                 | 45 nm                   | 65 nm | 130 nm                                        |
| Sensing time (ns)            | 6.4                   | 1.0                     | 1.95  | 4.5                                           |
| Sensing current<br>(µA)      | 20                    | 30                      | 43    | 36                                            |
| Read energy (fJ)             | 396                   | 73                      |       | 491                                           |
| Scope of offset cancellation | Signal<br>development | Signal<br>amplification | None  | Signal<br>development<br>and<br>amplification |

 Table 2. Comparison between proposed circuit and other circuits

adhere to the 1-Transistor 1-MTJ cell structure.

It should be noted that the absolute error rate measured by the Monte-Carlo simulations in this work is valid only for a particular implementation of the circuit. For example, they can change if a different technology is employed. However, it can be argued that the relative merit of the proposed circuit will be maintained even when a more advanced CMOS and MTJ technologies are employed.

Table 2 compares the results of this work with those of previously published works in terms of the CMOS technology used, sensing time, sensing current, read energy and the scope of offset cancellation. In Table 2, the sensing time was defined as the time from the start of the sensing operation to the start of the latch operation [8, 12]. The sensing time and read energy consumption of the proposed circuit was measured to be 4.5 ns and 491 fJ by simulations, respectively.

In Table 2, we can observe that the proposed circuit has a longer sensing time and larger read energy than other circuits. However, in the scope of the offset cancellation, the proposed circuit is superior to other circuits. The OCTS read circuit [8] can only remove the offset in the signal development stage, still suffering from the offsets in the signal amplification stage and/or of the latch. LOC-SA [10] makes it possible to ignore the offset of the latch by offset-free signal amplification, still suffering from the offsets in the signal development stage.

Because the proposed circuit stores the mismatch information during phase 1, which occupies almost half of all sensing time, it is natural that it has longer sensing time and larger read energy. In our case, that was exacerbated by the fact that we had to use a unnecessarily large  $C_0 = 11.6$  fF. If a smaller  $C_0$  could be used in a more advanced process specifically designed for STT-MRAMs, the increase of the sensing time and energy consumption could be greatly reduced. Most of all, the proposed circuit achieves significantly lower read error-rate at the cost of a modest increase in time and energy. The proposed STT-MRAM read-circuit can be valuable when the offset ultimately limits the performance of a read-circuit.

# V. CONCLUSIONS

In this paper, we proposed a read-circuit which can cancel the effects of device mismatches. In the proposed circuit, STT-MRAM read signal is generated by applying an identical amount of current to the data cell and the reference cell sequentially cancelling mismatches in the sensing circuit using a capacitor. Then the read signal is pre-amplified by a positive feedback loop. We implemented the proposed circuit with a 130nm CMOS technology and confirmed the effectiveness of this offsetfree pre-amplification with the current sourcing scheme by Spice-level Monte-Carlo simulations.

#### ACKNOWLEDGMENTS

The CAD tools used in this work was supported by IDEC, Korea.

#### REFERENCES

- C. Chappert, A. Fert, and F. N. Van Dau, "The emergence of spin electronics in data storage," *Nature Materials*, Vol. 6, No. 11, pp. 813–823, Nov. 2007.
- [2] J. Li et al., "Design Paradigm for Robust Spin-Torque Transfer Magnetic RAM (STT MRAM) From Circuit/Architecture Perspective," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, Vol. 18, No. 12, pp. 1710–1723, Dec. 2010.
- [3] Y. Zhang, W. Wen, and Y. Chen "The Prospect of STT-RAM Scaling From Readability Perspective," *IEEE Trans. on Magnetics*, Vol. 48, No. 11, pp.

- [4] K. Kim, and C. Yoo, "Macro-Model of Magnetic Tunnel Junction for STT-MRAM including Dynamic Behavior," *J. of Semiconductor Technol.* and Science (JSTS), Vol. 14, No. 6, pp. 728–732, Dec. 2014.
- [5] K. Shin, S. Im, and S. Park, "Low-Power Write-Circuit with Status-Detection for STT-MRAM," *J.* of Semiconductor Technol. and Science (JSTS), Vol. 16, No. 1, pp. 23–30, Feb. 2016.
- [6] J. Choi et al., "Novel Self-Reference Sense Amplifier for Spin-Transfer-Torque Magneto-Resistive Random Access Memory," J. of Semiconductor Technol. and Science (JSTS), Vol. 16, No. 1, pp. 31–38, Feb. 2016.
- [7] W. Kang et al., "High reliability sensing circuit for deep submicron spin transfer torque magnetic random access memory," *Electronics Letters*, Vol. 49, No. 20, pp. 1283–1285, Sep. 2013.
- [8] T. Na et al., "An Offset-Canceling Triple-Stage Sensing Circuit for Deep Submicrometer STT-RAM," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, Vol. 22, No. 7, pp. 1620–1624, Jul. 2014.
- [9] M. Jefremow et al., "Time-differential sense amplifier for sub-80mV bitline voltage embedded STT-MRAM in 40nm CMOS," 2013 IEEE Inter. Solid-State Circuits Conference Digest of Technical Papers, pp. 216–217, Feb. 2013.
- [10] B. Song et al., "Latch Offset Cancellation Sense Amplifier for Deep Submicrometer STT-RAM," *IEEE Trans. on Circuits and Systems I: Regular Papers*, Vol. 62, No. 7, pp. 1776–1784, Jul. 2015.
- [11] D. Goql at al., "A 16-Mb MRAM Featuring Bootstrapped Write Drivers," *IEEE J. of Solid-State Circuits*, Vol. 40, No. 4, pp. 902-908, Apr. 2005.
- [12] J. Kim et al., "A Novel Sensing Circuit for Deep Submicron Spin Transfer Torque MRAM (STT-MRAM)," *IEEE Trans. on Very Large Scale Integration (VLSI) Systems*, Vol. 20, No. 1, pp. 181–186, Jan. 2012



**Dong-Gi Lee** was born in Daegu, Korea, on 1988. He received the B.S. degree in the Department of Electronics Systems Engineering from Hanyang University, Korea, in 2015. He is currently pursuing the M.S. degree in the Department of

Electronics and Computer Engineering from Hanyang University, Korea. His interests include analog circuits, STT-MRAM circuits.



Sang-Gyu Park was born in Seoul, Korea. He received B.S. and M.S. degrees in electronics engineering from Seoul National University, Korea, in 1990 and 1992, respectively, and a Ph.D. degree in electrical and computer engineering

from Purdue University, Indiana, US, in 1998. From 1998 to 2000, he was a Senior Technical Staff Member at AT&T Laboratories. In 2000, he joined the faculty of the Department of Electronics and Computer Engineering of Hanyang University, Seoul, Korea, where he is now a professor. His current research area is mixed-signal CMOS integrated circuit design focusing on STT-MRAM circuits and delta-sigma ADCs.