66
SWAMIT TANNU DOUG CARMEAN MOINUDDIN QURESHI MEMSYS-2017 CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS: A FEASIBILITY STUDY

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM ...€¦ · swamit tannu doug carmean moinuddin qureshi memsys-2017 cryogenic dram based memory system for scalable quantum

  • Upload
    lamtu

  • View
    214

  • Download
    0

Embed Size (px)

Citation preview

S W A M I T TA N N U

D O U G C A R M E A N

M O I N U D D I N Q U R E S H I

MEMSYS-2017

CRYOGENIC DRAM BASED MEMORY SYSTEM FOR SCALABLE QUANTUM COMPUTERS:

A FEASIBILITY STUDY

Why Quantum Computers?

❖ Quantum computers provide large speedup for problems in material science, machine learning, and medicine

Quantum Computers enable solutions to important problems

Exe

cutio

n Ti

me Classical

Computer

Problem SizeProblem Size

Quantum Computer

Molecule and Material Simulations

2

Billion Years

Days

Qubits: Background

Quantum computer use quantum bits (qubits) to encode the information

Classical Bit

3

❖ State of a Classical Bit1 or 0 two points on sphere

Qubits: Background

Quantum computer use quantum bits (qubits) to encode the information

Classical BitQuantum Bit

3

❖ State of a Classical Bit1 or 0 two points on sphere

Qubits: Background

Quantum computer use quantum bits (qubits) to encode the information

Classical BitQuantum Bit

3

❖ State of a Classical Bit1 or 0 two points on sphere

❖ State of a Quantum Bit Any point on the sphere

Organization of Quantum Computer4

Quantum Computer

Control

Processor

Qubits

Organization of Quantum Computer4

Quantum Computer

Control

Processor

Qubits

Organization of Quantum Computer4

Quantum Computer

Control

Processor

Qubits

Organization of Quantum Computer

Control Processor -- Interface between Qubits & Programmer

4

Quantum Computer

Control

Processor

Qubits

Qubits are fickle 5

Qubits are kept at extremely low temperature (~20mK)

❖ No quantization

small change in state

lead to errors

❖ Room temperature

too noisy to operate

1

0

Classical Bit Quantum Bit

Qubits are fickle 5

Qubits are kept at extremely low temperature (~20mK)

❖ No quantization

small change in state

lead to errors

❖ Room temperature

too noisy to operate

1

00

Classical Bit Quantum Bit

Qubits are fickle 5

Qubits are kept at extremely low temperature (~20mK)

❖ No quantization

small change in state

lead to errors

❖ Room temperature

too noisy to operate

1

00

Classical Bit Quantum Bit

Todays Quantum Computer

20mK

Dilution Refrigerator

5 Qubit Chip (IBM)

Todays Quantum Computer

20mK

Dilution Refrigerator

Qubits 5 Qubit Chip (IBM)

5 Qubit Chip (IBM)

Cryogenic Control Processor

Control

Processor

Qubits 20mK

300K

7

Cryogenic Control Processor

Control

Processor

Qubits 20mK

300K

7

Cryogenic Control Processor

Control

Processor

Qubits 20mK

300K

Large Thermal Gradient

Metal WiresThermal Leakage

7

Cryogenic Control Processor

Control

Processor

Qubits 20mK

300K

Large Thermal Gradient

Metal WiresThermal Leakage

7

Cryogenic Control Processor

Control

Processor

Qubits 20mK

300K

Large Thermal Gradient

Metal WiresThermal Leakage

7

Qubits 20mK

Control

Processor

4K

Cryogenic Control Processor

Control

Processor

Qubits 20mK

300K

Large Thermal Gradient

Metal WiresThermal Leakage

7

Qubits 20mK

Control

Processor

Superconducting wires

Low Leakage

4K

Cryogenic Control Processor

Cryogenic Control Processor is essential for scalable Quantum Computer (Ref: D. Carmean, ISCA’16 Keynote)

Control

Processor

Qubits 20mK

300K

Large Thermal Gradient

Metal WiresThermal Leakage

7

Qubits 20mK

Control

Processor

Superconducting wires

Low Leakage

4K

Memory for Quantum Computers

Quantum Computer

Control Processor

Qubits

Memory

❖ Program Memory + Data Memory Stores

Quantum Executable , Data , ECC-frames (~10s GB)

❖ Memory must be kept at cryo temperature to

avoid large thermal gradient

❖ Josephson Junction technology works at 4K

Limited memory density (only few Mb)

8

Memory for Quantum Computers

Quantum Computer

Control Processor

Qubits

Memory

❖ Program Memory + Data Memory Stores

Quantum Executable , Data , ECC-frames (~10s GB)

❖ Memory must be kept at cryo temperature to

avoid large thermal gradient

❖ Josephson Junction technology works at 4K

Limited memory density (only few Mb)

Memory

Control Processor

Qubits

8

Memory for Quantum Computers

Quantum Computer

Control Processor

Qubits

MemoryData

Memory

Program

Memory

❖ Program Memory + Data Memory Stores

Quantum Executable , Data , ECC-frames (~10s GB)

❖ Memory must be kept at cryo temperature to

avoid large thermal gradient

❖ Josephson Junction technology works at 4K

Limited memory density (only few Mb)

Data

Memory

Program

Memory

Control Processor

Qubits

8

Memory for Quantum Computers

Quantum Computer

Control Processor

Qubits

MemoryData

Memory

Program

Memory

Quantum computers require substantial memory capacity at cryo temperature

❖ Program Memory + Data Memory Stores

Quantum Executable , Data , ECC-frames (~10s GB)

❖ Memory must be kept at cryo temperature to

avoid large thermal gradient

❖ Josephson Junction technology works at 4K

Limited memory density (only few Mb)

Data

Memory

Program

Memory

Control Processor

Qubits

8

Does commodity DRAM work at cryogenic temperatures ?

9

Does commodity DRAM work at cryogenic temperatures ?

9

Goal: To characterize DRAM at cryogenic temperature to understand the functionality and error patterns

Why Memory Fails at Cryogenic Temperature?10

Temperature

Why Memory Fails at Cryogenic Temperature?10

Temperature Carriers (e-)

Why Memory Fails at Cryogenic Temperature?10

Temperature Carriers (e-)

Why Memory Fails at Cryogenic Temperature?10

Temperature Threshold VoltageCarriers (e-)

Why Memory Fails at Cryogenic Temperature?10

Temperature Threshold VoltageCarriers (e-)Faults

Why Memory Fails at Cryogenic Temperature?10

Temperature Threshold VoltageCarriers (e-)Faults

At low temperatures, carrier freezeout can cause increase in threshold voltageworsens error rate

Why Memory Fails at Cryogenic Temperature?10

Temperature Threshold VoltageCarriers (e-)Faults

At low temperatures, carrier freezeout can cause increase in threshold voltageworsens error rate

Minimum Operational Temperature (MOT)Minimum temperature for fault free operation

Minimum Operational Temperature (MOT)Minimum temperature for fault free operation

EXECUTIVE SUMMARY

❖ Why Cryogenic DRAM?

❖ Experimental Setup & Challenges

❖ Observations

11

❖ Conventional memory testing Memtest86 running on host, dedicated

memory testers

How to Test DRAM at Cryogenic Temperature? 12

❖ Conventional memory testing Memtest86 running on host, dedicated

memory testers

❖ Host machines or memory testers do not work at cryogenic temperatures

How to Test DRAM at Cryogenic Temperature? 12

Need mechanism to reduce DIMM temperature without affecting tester

❖ Conventional memory testing Memtest86 running on host, dedicated

memory testers

❖ Host machines or memory testers do not work at cryogenic temperatures

How to Test DRAM at Cryogenic Temperature? 12

Isolated Cooling of DIMM

❖Need cryogenic coolant Liquid Nitrogen

(boils at 77K)

❖Need isolated cooling of DIMMs

Compact cryogenic heatsink

13

Isolated Cooling of DIMM

❖Need cryogenic coolant Liquid Nitrogen

(boils at 77K)

❖Need isolated cooling of DIMMs

Compact cryogenic heatsink

13

Isolated Cooling of DIMM

❖Need cryogenic coolant Liquid Nitrogen

(boils at 77K)

❖Need isolated cooling of DIMMs

Compact cryogenic heatsink

❖ DIMM is sandwiched between two

heatsinks and can be cooled down to 80K

13

Isolated Cooling of DIMM

❖Need cryogenic coolant Liquid Nitrogen

(boils at 77K)

❖Need isolated cooling of DIMMs

Compact cryogenic heatsink

❖ DIMM is sandwiched between two

heatsinks and can be cooled down to 80K

Compact heatsink with Liquid Nitrogen provides isolated cooling of a DIMM

13

14

14

Challenges: Thermal Shock & Ice Condensation

Time

Tem

pera

ture

(K)

300K

80K

Limit rate of cooling & use isolation chamber to reduce condensation

15

Challenges: Thermal Shock & Ice Condensation

THERMAL SHOCK

Time

Tem

pera

ture

(K)

300K

80K

Limit rate of cooling & use isolation chamber to reduce condensation

15

Challenges: Thermal Shock & Ice Condensation

THERMAL SHOCK

Time

Tem

pera

ture

(K)

300K

80K

Limit rate of cooling & use isolation chamber to reduce condensation

15

Challenges: Thermal Shock & Ice Condensation

THERMAL SHOCK

Time

Tem

pera

ture

(K)

300K

80K

Co

nd

en

satio

n

Limit rate of cooling & use isolation chamber to reduce condensation

15

Challenges: Thermal Shock & Ice Condensation

THERMAL SHOCK

Time

Tem

pera

ture

(K)

300K

80K

Limit rate of cooling & use isolation chamber to reduce condensation

15

Co

nd

en

satio

n

Challenges: Thermal Shock & Ice Condensation

THERMAL SHOCK

Time

Tem

pera

ture

(K)

300K

80K

Limit rate of cooling & use isolation chamber to reduce condensation

15

Co

nd

en

satio

n

Experimental Methodology

❖Verify memory functionality by

using march-tests

❖Fault single bit fault in a burst

❖MOT Minimum temperature at

which no faults are observed

16

55

Number of Chips 750

Number of Vendors 5

Number of DIMMS

Minimum Operational Temperature for DIMMs

18% DIMMs are functional below 90K

17

70

80

90

100

110

120

130

140

150

160

170

0 10 20 30 40 50 60

18%55%

100%

90%

Min

imum

Ope

ratin

g Te

mpe

ratu

re (K

)

Minimum Operational Temperature for DIMMs

18% DIMMs are functional below 90K

17

70

80

90

100

110

120

130

140

150

160

170

0 10 20 30 40 50 60

18%55%

100%

90%

Min

imum

Ope

ratin

g Te

mpe

ratu

re (K

)

DIMM Failure!

Chip Failures

92% of chips worked at cryogenic conditions— Pick cryogenic tolerant chips

Functional Chips

Faulty Chip

92%

8%

18

Min Operational Temperature Vs Chip Capacity

70

80

90

100

110

120

130

140

150

160

170

0 10 20 30 40 50 60 70

Min

imum

Ope

ratin

g Te

mpe

ratu

re (K

)

MOT increases with capacity; capacity of chip is correlated to technology node

150

160

170256 Mb

512 Mb

1 Gb

4 Gb

2 Gb

19

Min Operational Temperature Vs Chip Capacity

70

80

90

100

110

120

130

140

150

160

170

0 10 20 30 40 50 60 70

Min

imum

Ope

ratin

g Te

mpe

ratu

re (K

)

MOT increases with capacity; capacity of chip is correlated to technology node

150

160

170256 Mb

512 Mb

1 Gb

4 Gb

2 Gb

19

Fault Granularity

Uncorrelated faults Conventional ECC can be effective for Cryo DRAM

❖ Single bit errors

Uncorrelated faults

❖Linear codes (SECDED, BCH)

are still effective

20

Single bit fault 99.985%

Double bit fault 0.015%

Single bit

DoubleDoubleDouble

Transient Vs Permanent Faults

Permanent faults conventional sparing techniques can be used

21

❖Repeated faulty addresses = permanent error

❖Unique faulty address = transient error

Transient41%

Permanent59%

DDR3

Transient36%

Permanent64%

DDR2

Transient53%

Permanent

47%

DDR4

Conclusion

❖ Quantum computers need dense memory at low temperature

❖ Does DRAM Work at cryogenic temperature?

❖ Experiments show most commodity DRAM chips work at 90K

❖ Error patterns are amenable to existing fault tolerance techniques

22

Questions?

23

Questions?

Want to know more about quantum computers?

23

Questions?

Want to know more about quantum computers?

Please visit my paper presentation at MICRO 2017 In 2 weeks in Boston!

23

Backup slides

24

25

26

Most Chips Work at 80K

Only 8% of chips fail at 80K — Pick cryo tolerant chips

Faulty Chips per DIMM

Functional Chips

One Faulty Chip

Two Faulty Chips

Three Faulty Chips

Four Faulty Chips

92%

7%

28