3
264 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14 / TD: LOW-POWER WIRELESS AND ADVANCED INTEGRATION / 14.5 14.5 A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip Wireless Superconnect with Transmit Power Control Scheme Noriyuki Miura 1 , Daisuke Mizoguchi 1 , Mari Inoue 1 , Hiroo Tsuji 1 , Takayasu Sakurai 2 , Tadahiro Kuroda 1 1 Keio University, Yokohama, Japan 2 The University of Tokyo, Tokyo, Japan A 160Gb/s interface between a CPU chip and a DRAM chip using a micro-bump technology has already been proposed [1]. In this paper, a wireless technology based on inductive coupling that out- performs the micro-bump technology is presented. Figure 14.5.1 summarizes the performance of the proposed scheme and com- pares it with [1]. Aggregate data rate of 195Gb/s is achieved. Using the proposed technique, up to 4 chips, stacked in a pack- age, no matter whether they are placed face-up or down, can com- municate at 195Gb/s. The technology in [1] can be applied to only 2 face-to-face chips. If the proposed technique is used for commu- nication between two chips, the chip thickness is reduced to 10μm; the communication distance between transceivers on the two chips, including the thickness for glues, is 15μm. The pro- posed transmitter circuit consumes 60% less power compared to that of the conventional circuit. Furthermore, depending on the communication distance, the transmit power is controlled dynamically to reduce both power dissipation and the crosstalk. Compared to the implementation in [1], despite the fact that the proposed design is implemented in a less advanced technology, the area is reduced by a factor of 9, and the channel pitch is reduced by 10μm (from 60μm to 50μm). Since the proposed tech- nique has a circuit solution in standard CMOS, it has several advantages over the techniques based on mechanical contact such as the micro-bump technology and a through-silicon-via technology. These advantages include the know good die problem, reliability, cost, and scaling. A test chip is designed and fabricated in 0.25μm CMOS. Figure 14.5.2 is a micrograph of a stack of test chips. 195 transceivers are arranged in a 3×65 array with 50μm channel pitch. In the transceiver, transmitter (TX) and receiver (RX) circuits are placed under inductors to save layout area. For crosstalk reduc- tion, techniques such as space division multiple access (SDMA) and time division multiple access (TDMA), that are described in [3], are employed. The upper chip is rotated by 180 degrees for probe arrangement. The upper chip is polished to various thicknesses prior to assem- bly to perform measurements over several communication dis- tances, Xs. Measured communication distances are 15, 28, 36, 43, and 59μm, which correspond to the cases where 2, 3, 3, 4, and 5 chips are stacked. Power and ground are provided by DC probes and a 1GHz clock signal, Clk, is provided to the chips by AC probes. Transmitted data of the center channel in the array, Txdata, is generated by an external signal source, and received data, RXdata, is monitored through AC probes. Figure 14.5.3 pre- sents measured waveforms of Clk, Txdata, and Rxdata, when X=15μm. Data rate of 1Gb/s is demonstrated. A single-ended transmitter circuit, depicted in Fig. 14.5.4, is pro- posed for power reduction. Since inductor current I T is charged by a series capacitor and reused for the next opposite pulse genera- tion, the single-ended transmitter can generate the I T signal twice with the same power dissipation of the conventional H- bridge circuit in [2]. Furthermore, delay buffers and one of the output inverters are removed. As a result, power dissipation is reduced by 60%. A receiver circuit of [2] is employed that dissi- pates 2.2mW at 1Gb/s. In addition, the transmit power, P TX , is controlled in 15 levels by a transmit power control register, as depicted in Fig. 14.5.5, so that P TX can be set to the minimum power level required for a given communication distance. The minimum transmit power for each communication distance is measured and presented in a shmoo plot in Fig. 14.5.5. The required transmit power for X=15μm, 28μm, and 43μm, are 4mW, 9mW, and 19mW, respec- tively. The transmit power control is effective not only for power reduc- tion but also for crosstalk reduction in parallel communications. If a transmitter transmits larger than necessary power, cross talk increases in the nearby chips, and hence, the bit error rate (BER) will increase. Figure 14.5.6 depicts measured shmoo plots, (a) without the power control, and (b) with the power control. The center channel in the array in the lowest chip transmits Txdata, and the other 194 surrounding channels transmit 2 31 -1 pseudo random binary sequence (PRBS) data generated by 4 on-chip 31b linear feedback shift registers (LFSRs). The number of surround- ing channels that are activated is digitally selected. The received Rxdata is monitored through a center channel in the upper chips. If transmit power is not controlled, the aggregated data rate is reduced to 39Gb/s with 3×13 channel arrays, which is only 20% of the throughput when transmit power is controlled. Based on the measurement and analysis reported in [3], increas- ing the distance will attenuate the crosstalk more rapidly increasing the number of channels. Therefore, BER will hardly increase when the channel array is larger than 3×65. Aggregated data rate of 390Gb/s/mm 2 is quite possible. For multi-channel bus communications, the transmit power should be large enough to communicate with a chip which is at the farthest distance. The receiver sensitivity should be controlled in each chip or the space between channels should be enlarged to reduce the crosstalk. It is reported in [4] that substrate thickness is reduced to 1.7μm without affecting transistor characteristics. Based on a simula- tion study, if the communication distance is reduced to 10μm, and a 90nm CMOS technology is employed, that aggregate data rate of 1.5Tb/s/mm 2 can be achieved with a power dissipation of 5W. Acknowledgement: The VLSI chip in this study is fabricated in the chip fabrication program of VLSI Design and Education Center (VDEC), the University of Tokyo, in collaboration with MOSIS and Taiwan Semiconductor Manufacturing Company (TSMC). The authors are grateful to M. Tago, M. Fukaishi, and Y. Nakagawa for their assistance in measurement. References: [1] T. Ezaki et al., “A 160Gb/s Interface Design Configuration for Multichip LSI,” ISSCC Dig. Tech. Papers, pp.140-141, Feb., 2004. [2] D. Mizoguchi et al., “A 1.2Gb/s/pin Wireless Superconnect Based on Inductive Inter-cihp Signaling (IIS),” ISSCC Dig. Tech. Papers, pp.142- 143, Feb., 2004. [3] N. Miura et al., “Cross Talk Countermeasures in Inductive Inter-chip Wireless Superconnect,” Proc. CICC, pp.99-102, Oct., 2004. [4] T. Ohguro et al., “Ultra-Thin Chip with Permalloy Film for High Performance MS/RF CMOS,” Symp. VLSI Technology, pp.220-221, June, 2004.

14.5 - A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip ...264 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: 14.5 - A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip ...264 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14

264 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE.

ISSCC 2005 / SESSION 14 / TD: LOW-POWER WIRELESS AND ADVANCED INTEGRATION / 14.5

14.5 A 195Gb/s 1.2W 3D-Stacked InductiveInter-Chip Wireless Superconnect withTransmit Power Control Scheme

Noriyuki Miura1, Daisuke Mizoguchi1, Mari Inoue1, Hiroo Tsuji1,Takayasu Sakurai2, Tadahiro Kuroda1

1Keio University, Yokohama, Japan2The University of Tokyo, Tokyo, Japan

A 160Gb/s interface between a CPU chip and a DRAM chip usinga micro-bump technology has already been proposed [1]. In thispaper, a wireless technology based on inductive coupling that out-performs the micro-bump technology is presented. Figure 14.5.1summarizes the performance of the proposed scheme and com-pares it with [1]. Aggregate data rate of 195Gb/s is achieved.Using the proposed technique, up to 4 chips, stacked in a pack-age, no matter whether they are placed face-up or down, can com-municate at 195Gb/s. The technology in [1] can be applied to only2 face-to-face chips. If the proposed technique is used for commu-nication between two chips, the chip thickness is reduced to10µm; the communication distance between transceivers on thetwo chips, including the thickness for glues, is 15µm. The pro-posed transmitter circuit consumes 60% less power compared tothat of the conventional circuit. Furthermore, depending on thecommunication distance, the transmit power is controlleddynamically to reduce both power dissipation and the crosstalk.Compared to the implementation in [1], despite the fact that theproposed design is implemented in a less advanced technology,the area is reduced by a factor of 9, and the channel pitch isreduced by 10µm (from 60µm to 50µm). Since the proposed tech-nique has a circuit solution in standard CMOS, it has severaladvantages over the techniques based on mechanical contactsuch as the micro-bump technology and a through-silicon-viatechnology. These advantages include the know good die problem,reliability, cost, and scaling.

A test chip is designed and fabricated in 0.25µm CMOS. Figure14.5.2 is a micrograph of a stack of test chips. 195 transceiversare arranged in a 3×65 array with 50µm channel pitch. In thetransceiver, transmitter (TX) and receiver (RX) circuits areplaced under inductors to save layout area. For crosstalk reduc-tion, techniques such as space division multiple access (SDMA)and time division multiple access (TDMA), that are described in[3], are employed. The upper chip is rotated by 180 degrees forprobe arrangement.

The upper chip is polished to various thicknesses prior to assem-bly to perform measurements over several communication dis-tances, Xs. Measured communication distances are 15, 28, 36, 43,and 59µm, which correspond to the cases where 2, 3, 3, 4, and 5chips are stacked. Power and ground are provided by DC probesand a 1GHz clock signal, Clk, is provided to the chips by ACprobes. Transmitted data of the center channel in the array,Txdata, is generated by an external signal source, and receiveddata, RXdata, is monitored through AC probes. Figure 14.5.3 pre-sents measured waveforms of Clk, Txdata, and Rxdata, whenX=15µm. Data rate of 1Gb/s is demonstrated.

A single-ended transmitter circuit, depicted in Fig. 14.5.4, is pro-posed for power reduction. Since inductor current IT is charged bya series capacitor and reused for the next opposite pulse genera-tion, the single-ended transmitter can generate the IT signaltwice with the same power dissipation of the conventional H-bridge circuit in [2]. Furthermore, delay buffers and one of theoutput inverters are removed. As a result, power dissipation isreduced by 60%. A receiver circuit of [2] is employed that dissi-pates 2.2mW at 1Gb/s.

In addition, the transmit power, PTX, is controlled in 15 levels bya transmit power control register, as depicted in Fig. 14.5.5, sothat PTX can be set to the minimum power level required for agiven communication distance. The minimum transmit power foreach communication distance is measured and presented in ashmoo plot in Fig. 14.5.5. The required transmit power forX=15µm, 28µm, and 43µm, are 4mW, 9mW, and 19mW, respec-tively.

The transmit power control is effective not only for power reduc-tion but also for crosstalk reduction in parallel communications.If a transmitter transmits larger than necessary power, cross talkincreases in the nearby chips, and hence, the bit error rate (BER)will increase. Figure 14.5.6 depicts measured shmoo plots, (a)without the power control, and (b) with the power control. Thecenter channel in the array in the lowest chip transmits Txdata,and the other 194 surrounding channels transmit 231-1 pseudorandom binary sequence (PRBS) data generated by 4 on-chip 31blinear feedback shift registers (LFSRs). The number of surround-ing channels that are activated is digitally selected. The receivedRxdata is monitored through a center channel in the upper chips.If transmit power is not controlled, the aggregated data rate isreduced to 39Gb/s with 3×13 channel arrays, which is only 20% ofthe throughput when transmit power is controlled.

Based on the measurement and analysis reported in [3], increas-ing the distance will attenuate the crosstalk more rapidlyincreasing the number of channels. Therefore, BER will hardlyincrease when the channel array is larger than 3×65. Aggregateddata rate of 390Gb/s/mm2 is quite possible. For multi-channel buscommunications, the transmit power should be large enough tocommunicate with a chip which is at the farthest distance. Thereceiver sensitivity should be controlled in each chip or the spacebetween channels should be enlarged to reduce the crosstalk.

It is reported in [4] that substrate thickness is reduced to 1.7µmwithout affecting transistor characteristics. Based on a simula-tion study, if the communication distance is reduced to 10µm, anda 90nm CMOS technology is employed, that aggregate data rateof 1.5Tb/s/mm2 can be achieved with a power dissipation of 5W.

Acknowledgement:The VLSI chip in this study is fabricated in the chip fabrication programof VLSI Design and Education Center (VDEC), the University of Tokyo, incollaboration with MOSIS and Taiwan Semiconductor ManufacturingCompany (TSMC). The authors are grateful to M. Tago, M. Fukaishi, andY. Nakagawa for their assistance in measurement.

References:[1] T. Ezaki et al., “A 160Gb/s Interface Design Configuration for MultichipLSI,” ISSCC Dig. Tech. Papers, pp.140-141, Feb., 2004.[2] D. Mizoguchi et al., “A 1.2Gb/s/pin Wireless Superconnect Based onInductive Inter-cihp Signaling (IIS),” ISSCC Dig. Tech. Papers, pp.142-143, Feb., 2004.[3] N. Miura et al., “Cross Talk Countermeasures in Inductive Inter-chipWireless Superconnect,” Proc. CICC, pp.99-102, Oct., 2004.[4] T. Ohguro et al., “Ultra-Thin Chip with Permalloy Film for HighPerformance MS/RF CMOS,” Symp. VLSI Technology, pp.220-221, June,2004.

Page 2: 14.5 - A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip ...264 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14

265DIGEST OF TECHNICAL PAPERS •

Continued on Page 597

ISSCC 2005 / February 8, 2005 / Salon 1-6 / 3:15 PM

Figure 14.5.1: Performance summary and comparison.Figure 14.5.2: Micrograph of a stack of test chips and transceiver chan-nel array.

Figure 14.5.3: Measured transmitted and received data at 1Gb/s.

Figure 14.5.5: Transmit power control scheme and measured minimumtransmit power dependence of communication distance.

Figure 14.5.6a: Measured shmoo plot on number of channels and com-munication distance, without transmit power control.

Figure 14.5.4: Measured power dissipation of transceiver circuit.

Performance Summary and Comparison

Aggregate Data Rate 195Gb/s

Total Power Dissipation 1.2W 2.2W

Total Area 0.5mm2

Channel Pitch 50�µm

Process CMOS 0.25�µm

This Work Ref. [1]

160Gb/s

No Data

4.5mm2

60�µm

CMOS 0.15�µm

InterconnectWireless

(Inductive Coupling)Wired

(Micro Bump)

Number of Chipsthat Communicate

2 Chips2 Chips

(Face-to-Face Only)3 Chips 4 Chips

4.1W

Communication Distance 15�µm 30�µm No Data45�µm

TransceiverChannel Array

50 m Pitch3x65 Channels0.15mm x 3.25mm

DC

Pro

befo

r Upp

er C

hip

DC

Pro

be

for Upperr Chip

Transceiver Channel Array

AC Probefor Lower Chip

for L

ower

Chi

p

AC Probe

48 m

Channel

Tx

Rx

for U

pper

Chi

p

DC

Pro

be

for Upperr Chip

Transceiver Channel Array

AC Probefor Lower Chip

AC Probefor Lower Chip

for L

ower

Chi

p

AC Probe

48 m

Channel

Tx

Rx

Communications Distance, X=15 m

Rxdata

Txdata

Clk

1ns

Rxdata

Txdata

Clk

1nsH-bridge Transmitter [2]

Tx/Rx

ITTxdata

Single-ended Transmitter(This Work)

TxdataITTx/Rx

Single-end(This Work)

Pow

er [m

W]

Data Rate [Gb/s]

2

4

6

8

10

12

14

16

18

0 0.2 0.4 0.6 0.8 1 1.2

Receiver [2]

-60%H-bridge [2]

SPICEMeasurement

0

5

10

15

20

15 28 36 43 59Communications Distance, X [ m]

Tran

smit

Pow

er, P

TX[m

W]

PASS FAILPPPPPPPPPPPPPPx

PPPPPPPPPPxxxxx

PPPPPxxxxxxxxxx

PPxxxxxxxxxxxxx

xxxxxxxxxxxxxxx

@1Gb/s

w2w

Txdata

IT

Transmit Power ControlRegister

w=8

w=4

w=2

w=1

15

28

3643

59

3x65 3x33 3x17 3x17 3x33 3x65

PP P PP PPP P PP P PPPP P PPPPPP P PPP PPPPPPPP P PP PPP P PP P PPPP P PPPPPP P PPP PPPPPP

x xx xxxxxx xxxx xx xxxxxxx xxx xx xxxx xx

x xx xxxxxx xxx xPPPPPPPxx xxx xx xxxx x x

PP P PP PPP P PP P PPPP P xx x xPx P PPx PPPPxx

3x1

Com

mun

icat

ions

Dis

tanc

e, X

[µm

]

Number of Channels

@1Gb/s

3x49 3x49

Channel Under TestChannel Array

1st

2nd

3rd

4th

5thFAIL

PASS

Tran

smit

pow

er

19m

Ww/o Transmit

Power Control

14

Page 3: 14.5 - A 195Gb/s 1.2W 3D-Stacked Inductive Inter-Chip ...264 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE. ISSCC 2005 / SESSION 14

597 • 2005 IEEE International Solid-State Circuits Conference 0-7803-8904-2/05/$20.00 ©2005 IEEE.

ISSCC 2005 PAPER CONTINUATIONS

Figure 14.5.6b: Measured shmoo plot on number of channelsand communications distance, with transmit power control.

Number of Channels

15

28

3643

59

3x65 3x33 3x17 3x17 3x33 3x65

PP P PP PPP P PP P PPPP P PPPPPP P PPP PPPPPPPP P PP PPP P PP P PPPP P PPPPPP P PPP PPPPPP

x xx xxxxxx xxxx xx xxxxxxx xxx xx xxxx xx

PP P PP PPP P PP P PPPP P PPPPPP P PPP PPPPPP

3x1

Com

mun

icat

ions

Dis

tanc

e, X

[µm

]

@1Gb/s

FAIL

PASS

3x49 3x49

Channel Array

PP P PP PPP P PP P PPPP P PPPPPP P PPP PPPPPP

Channel Under Test

1st

2nd

3rd

4th

5th

with Transmit Power Control

Tran

smit

pow

er19m

W9m

W4m

W