ECE5917 SoC Architecture: Introductioncontents.kocw.net/KOCW/document/2014/sungkyunkwan/hanta... · 2016. 9. 9. · SoC Architecture: Introduction Tae Hee Han: [email protected] Semiconductor

ECE5917SoC Architecture: Introduction

Tae Hee Han: [email protected]

Semiconductor Systems Engineering

Sungkyunkwan University

Course Information

n Objectivesn Aiming at educating competitive SoC engineers having System concept and

Market understanding

n Lecture Schedulen Wed. 13:30 ~ 16:15 PM

n References for this coursen Liming Xiu, VLSI Circuit Design Methodology Demystified, Wiley Inter-Science,

2008n Hennessy & Patterson, Computer Architecture 5th ed, Morgan Kaufmann,

2011n ARM Processor architecture, AMBA Bus manualn Jacob et al., Memory Systems, Morgan Kaufmann, 2008

2

Course Schedule

Schedule Contents Remarks

Week 1 Basic Concept, Introduction

Week 2 Case study: System

Week 3 Embedded System: Hardware/Software InterfaceWeek 4~5 ARM Processor and AMBA Bus System

Week 6~7 Memory and Peripheral Interface

Week 8 Midterm Exam.

Week 9~10 MP SoC

Week 11 Poster Presentation

Week 12~14 On Chip Network

Week 15 Poster Presentation

Week 16 Final Exam.

3

Grading System

n Homework: 20%

n Attendance: 10%

n Midterm: 25%

n Final: 25%

n Poster(10%) + Presentation(10%): 20%

4

Outline

n Historical Perspective of IC and Issues

n What is SoC?

n Traditional Design Flow

n SoC Design

5

6

IC: Historical Background & Issues

The Invention of Transistor

n John Bardeen, Walter Brattain & William Shockley invented “The first transistor” in 1948 (Bipolar Transistor)

7

The invention of Integrated Circuit

n Jack Kilby & Robert Noyce inveted “The Integrated Circuit” in 1958.

8

Jack S. KilbyWinner of the 2000 Nobel Prize

Connect 2 bipolar transistors in theSame substrate by bonding wire.

Moore’s Law (1965)

n The single most important guideline in microprocessor fabrication and architecture

1. "the number of transistors per chip will double every 12 24 18 months“

2. "as the sophistication of chips goes up, the cost of [fabrication plants] goes up exponentially"

n cost-integration relation

n Both are held true after four decades.

9

(http://download.intel.com/museum/research/arc_collect/history_docs/pix/hoff1.jpg)

Gordon Moore Original graph from 1965 (source: www.intel.com)

Rela

tive

Man

ufac

turin

g Co

st/C

ompo

nent

Number of Components Per Integrated Circuit

MOS Transistor Scaling (1974 to present)

S=0.7 [0.5´ per 2 nodes]

Source: 2001 ITRS - Exec. Summary, ORTC Figure

(TypicalMPU/ASIC)

Poly Pitch

(TypicalDRAM)

Metal Pitch

§ Decreased transistor/feature sizes è

§ Increased variability (tox, BEOL, DFM, SEU, etc.)

§ Short channel effect, leakage power

ø BEOL: Back-End-Of-The-Lineø DFM: Design for Manufacturabilityø SEU: Single Event Upset

10

Scaling - FEOL, BEOL

11

øFEOL: Front end of LineøBEOL: Back end of Line

Ecosystem of Integrated Circuits

12

Performance, Cost and Power

13

Source: GSA (Nov. 2012)

Performance is a lasting theme Reducing cost while keeping performance

Performance Cost

Power

Reducing power consumptionwhile keeping performance and cost

State-of-arts Design skills

Trade-off Compromise

Declining Designs

14Source: IBS (2012)

Cost of New Fab. Increases Dramatically

Source: GSA (Nov. 2012)

15

No Cost-effective Lithography Solutions

n High cost and availability of production EUV

n Integration of EUV and FinFET technology on 450mm wafers

n Expected to drive 3D integration to have major impact on extending Moore’s law

Source: IBM (2013)

45nm 32nm 22nm 14nm 10nm

Immersion (ArFi) 2nd Generation Immersion

3rd Gen ArFi w/ Source Mask Optimization

(SMO)

4th Gen ArFi w/SMO & Double

Patterning (DPL)

5th Gen ArFi w/ Multilayer

Patterning or EUV

16

Moore’s Law & More

Analog/RF

HVPower Passives Sensors

Actuators Biochips

Scali

ng (

More

Moo

re)

Functional Diversification (More than Moore)

[sca

ling]

130nm

90nm

65nm

45nm

32nm

22nm

16nm

Source: JSTC, adapted from ITRS 2011

MULTICHIP

MULTICOMPONENT IC

System-on-chip(SoC)

System-in-package(SiP)

17

Beyond CMOS

Technology Scaling

n 30% scaling down in dimensions à doubles transistor density

n Power per transistor n Vdd scaling à lower power

n Transistor delay = Cgate Vdd / ISATn Cgate , Vdd scaling à lower delay

GATE

SOURCE

BODY

DRAIN

tox

GATE

SOURCE DRAIN

L

leakddstdddd IVIVfCVP ++= 2a

18

Fundamental Trends

High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018

Technology Node (nm) 90 65 45 32 22 16 11 8

Integration Capacity (BT) 2 4 8 16 32 64 128 256

Delay = CV/I scaling 0.7 ~0.7 >0.7 Delay scaling will slow down

Energy/Logic Op scaling >0.35 >0.5 >0.5 Energy scaling will slow down

Bulk Planar CMOS High Probability Low Probability

Alternate, 3G etc Low Probability High Probability

Variability Medium High Very High

ILD (K) ~3 <3 Reduce slowly towards 2-2.5

RC Delay 1 1 1 1 1 1 1 1

Metal Layers 6-7 7-8 8-9 0.5 to 1 layer per generation

Source: Shekhar Borkar, Intel Corp.

19

2008 ITRS “Beyond CMOS” Definition Graphic

20

Computing and Data Storage Beyond CMOS

Source: Emerging Research Device Working Group

“More Moore” “Beyond CMOS”

22nm 16nm 11nm 8nm

BaselineCMOS

Ultimately Scaled CMOS

FunctionallyEnhanced CMOS

Spin LogicDevices

NanowireElectronics

FerromagneticLogic Devices

32nm

Channel Replacement Materials Low Dimensional Materials Channels

Multiple gate MOSFETs New State Variable

New Data RepresentationNew Devices

New Data Processing Algorithms

22/20 nm 15/11 nm 8 nm & Beyond

Conventional Planar Device

FINFET

ETSOISi Nano-Wire

C Electronics

Fully Depleted Devices

Si NW

HfO2

Deposited Si

Device Structure Research Pipeline

Innovation and Disruptive Technology at Each Node

Source: IBM (2011)

21

What would be the Limit of Downsizing!

22

3nm

ChannelSource Drain

Tunneling distance

Source: Hiroshi Iwai (Tokyo Institute of Technology, 2013)

Impact of Moore’s Law To Date

Push the Memory Wall à Larger caches

Increase Frequency àDeeper Pipelines

Increase ILP àConcurrent Threads,

Branch Prediction and SMT

Manage Power àclock gating, activity

minimization

IBM Power5

Source: IBM

23

Shaping Future Multicore Architectures

n The ILP Walln Limited ILP in applications

n The Frequency Walln Not much headroom

n The Power Walln Dynamic and static power dissipation

n The Memory Walln Gap between compute bandwidth and memory bandwidth

n Manufacturingn Non recurring engineering costsn Time to market

24

The Frequency Wall

n Not much headroom left in the stage to stage times (currently 8-12 FO4 delays)

n Increasing frequency leads to the power wall

Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, Doug Burger. Clock rate versus IPC: the end of the road for conventional microarchitectures. In ISCA 2000

25

Options

n Increase performance via parallelismn On chip this has been largely at the instruction/data level

n The 1990’s through 2005 was the era of instruction level parallelism

n Single instruction multiple data/Vector parallelismn MMX, SSIMD, Vector Co-Processors

n Out Of Order (OOO) execution coresn Explicitly Parallel Instruction Computing (EPIC)

n Have we exhausted options in a thread?

26

The ILP Wall - Past the Knee of the Curve?

“Effort”

Performance

ScalarIn-Order

Moderate-PipeSuperscalar/OOO

Very-Deep-PipeAggressive

Superscalar/OOO

Made sense to goSuperscalar/OOO:

good ROI

Very little gain forsubstantial effort

Source: G. Loh

27

The ILP Wall

n Limiting phenomena for ILP extraction:n Clock rate: at the wall each increase in clock rate has a corresponding CPI

increase (branches, other hazards)n Instruction fetch and decode: at the wall more instructions cannot be fetched

and decoded per clock cyclen Cache hit rate: poor locality can limit ILP and it adversely affects memory

bandwidthn ILP in applications: serial fraction on applications

n Reality:n Limit studies cap IPC at 100-400 (using ideal processor)n Current processors have IPC of only 1-2

28

The ILP Wall: Options

n Increase granularity of parallelismn Simultaneous Multi-threading to exploit TLP

n TLP has to exist à otherwise poor utilization resultsn Coarse grain multithreading n Throughput computing

n New languages/applicationsn Data intensive computing in the enterprisen Media rich applications

29

The Memory Wall

µProc60%/yr.

DRAM7%/yr.

1

10

100

1000

DRAM

CPU

Processor-MemoryPerformance Gap:(grows 50% / year)

Time

“Moore’s Law”

30

The Memory Wall

n Increasing the number of cores increases the demanded memory bandwidth

n What architectural techniques can meet this demand?

31

Average access time

Year?

The Memory Wall

n On die caches are both area intensive and power intensiven StrongARM dissipates more than 43% power in cachesn Caches incur huge area costs

n Larger caches never deliver the near-universal performance boost offered by frequency ramping (Source: Intel)

CPU0 CPU1

AMD Dual-Core Athlon FXIBM Power5

32

The Power Wall

n Power per transistor scales with frequency but also scales with Vdd

n Lower Vdd can be compensated for with increased pipelining to keep throughput constant

n Power per transistor is not same as power per area à power density is the problem!

n Multiple units can be run at lower frequencies to keep throughput constant, while saving power


33

Improving Power/Performance

n Consider constant die size and decreasing core area each generation = more cores/chip

n Effect of lowering voltage and frequency à power reductionn Increasing cores/chip à performance increase


Better power performance!

34

Special Purpose Hardware (a.k.a Accelerator)

2.23 mm X 3.54 mm, 260K transistors

Opportunities: Network processing enginesMPEG Encode/Decode engines, Speech engines

TCP/IP Offload Engine (TOE)

Source: Shekhar Borkar, Intel Corp.Special Purpose HW à Best MIPS/Watt

35

Moore’s Law reinterpreted

n Number of cores per chip can double every two years

n Clock speed will not increase (possibly decrease)

n Need to deal with systems with millions of concurrent threads

n Need to deal with inter-chip parallelism as well as intra-chip parallelism

36

The Economics of Manufacturing

n Where are the costs of developing the next generation processors?

n Design Costsn Manufacturing Costs

n What type of chip level solutions is the economics implying?

n Assessing the implications of Moore’s Law is an exercise in mass production

37

Valued Performance: SoC (System-on-a-Chip)

n Special-purpose hardware è more MIPS/mm2

38

Die Area Power Performance

General Purpose 2 ´ 2 ´ ~1.4 ´

Multimedia Kernels < 10% < 10% 1.5 ~ 4 ´

Software Virtualization - Disruptive Force for SoC Design

Traditional Design Cycle for a Typical SoC

CustomerEvaluation

IC Design Manufacturing ReferenceDesign

ProductionDesignWin

DesignTestPackage

Samples

Test & QualificationIC ValidationEngineering Samples

Firmware Development

Driver Development

Production, Software Development

Production SamplesDriver Development

12 Months 24 Months

“Virtual” Design Cycle for a Typical SoC

CustomerEvaluation

Virtual Design Manufacturing ProductionDesign

Win

DesignCustomerApplication Development

PrototypeEvaluation

Testing & QualificationIC ValidationEngineering Samples

Software DevelopmentProduction SamplesDriver Development

12 Months 24 Months

Benefits of the Virtual design cycle:

ØReduced time to market

ØReduced risk

ØMore effective collaboration between the IC vendor and customer

ØKey enabler for SoC startups

Source: Gartner

39

40

What is SoC?

What is SoC ?

n An SoC is a system on an IC that integrates software and hardware Intellectual Property (IP) using more than one design methodology for the purpose of defining the functionality and behavior of the proposed system.

n The designed system is application specific.

n Typical applications of SoC:n consumer devices, n networking, n communications, and n other segments of the electronics

industry.

mp memory

video unitgraphics

coms DSP custom

software

mp

41

CPU

DSP

Ip-Sec

mem

X

USBhub

mem

CPU DSP USBhub

Ip-Sec

X

Proc

Co-Proc

IP cores

Typical : $10

Up to now : collection of chips

Now : collection of cores

Typical : $70

Typical approach :

Define requirements

Design with off-the shelf chips

- at 0.5 year mark : first prototypes

- 1 year : ship with low margins/loss

start ASIC integration

- 2 years : ASIC-based prototypes

- 2.5 years : ship, make profits (with competition)

With SoC

Define requirements

Design with off-the shelf cores

- at 0.5 year mark : first prototypes

- 1 year : ship with high margin and market share

System on Chip Benefits

42

SoC Architecture

n Hardware Architecturen CPU, Hardware IPn Diverse memory elementsn I/O interfacen Bus

n SoC Complexity is increasingn # processing elements growsn Communication architecture

n Switched busn NoC (Network on Chip)

Intrinsix AMBA SoC Platform, Intrinsix Co.http://www.intrinsix.com/intrinsix-ip/soc-ip/amba.htm

43

Chip Design is Now System Design

44

Timing DrivenDesign

Block BasedDesign

Platform BasedDesign

Plug and PlaySystem on a Chip

Complex ASICwith a Few IPsASIC on DSM

moving into mainstream

Logic

mP CORE SRAMROM

Logic

Soft IF/ IP

mP CORESRAM

ROM

DSP ROM

MPEG RAM

cache

LOGIC

I/F

Mobile SoC

PMIC+

AudioCodec

Basic SoC Architecture – System example

Main TFT LCD & TSP

GPS Debug

UART0(CTS/RTS)

UART2/IrDA V1.1

USB Host 2.0

USB OTG 2.0

IIS(3-ch)

HS-MMC/SD/SDIO

(3-ch)

IIC/PWITFT LCDC

SMC

NAND Flash I/F

SRAM/ROM//NOR

SLC/MLC NAND

HDMI

Video Codec

AC97(1-ch)/PCM (2-ch)

Key pad(16x8)

HS-SPI(3-ch)

Modem I/FModem

2D engine

3D engine

JPEG Codec

12MP ISP

QWERTY Key

System PowerLi-Ion

DRAMC0 (333)mDDRLPDDR1LPDDR2

UART1(CTS/RTS)

DVB-H/TDMB/WiFi

Wibro/WiFi

Smartphone system configuration

45

UART

Connectivity

USB

Multimedia Acceleration

Camera IF

Video Codec

Graphics Engine

´ 64 Multi - Layer AHB/AXI Bus

Mobile SDRAM/Mobile DDR SDRAM

LPDDR1/LPDDR2

Memory Subsystem

PowerManagement

TFT LCDController

w/DSI

SRAM/ROM/NOR

CPU Core

Core & L1 cache

IrDA

I2S

I2C

GPIO

HS-MMC/SD

LCDC/OSD

Dynamic VoltageFrequency Scaling

Color-TFT LCD

PLL

RTCSystem Peripheral

Timer w/ PWM

Watch Dog Timer

DMA

Keypad

ADC & Touch Screen

TVout

SPI

Modem I/F

ATA

CryptoAccelerator

SecureROM

SecureRAM

OneDRAM

JPEG Codec

NAND Flash

AC97 / PCM

L2 Cache

CPU choosing & Cache structure decision

ISP spec decision & architecture design

MFC optimization & architecture design

Visual system architecture design

Memory system design

Memory controller optimization

File system optimization

Security system architecture design

Communication system performance optimization

Low power Audio play architecture design

System low power architecture design

System bus architecture design

Physical design balancing

Clock & reset architecture design

Basic SoC Architecture – Issues in SoC

46

Basic performance factors : Core performance, memory latency, bandwidth

CPU Multimedia

Multilevel interconnect bus

DRAM Peripheral &Other memories

CPU subsystem

• Cortex-A15 / Cortex-A7• L2 Cache• Enhancing CPU clock speed

Core performance

Multimedia subsystem

• Logic parallelism

• Prefetching

• Optimizing access pattern• Buffering & caching

Core performance

latency

bandwidth

Memory subsystem (esp, DRAM)

• Reducing controller latency

• Advanced precharge scheme• Optimizing scheduling scheme• Optimizing memory structure for video• Enhancing I/O Speed

latency

bandwidth

Bus subsystem

• Reducing arbitration delay• Reducing interconnect latency• Enhancing bus clock speed

• Optimizing arbitration scheme

latency

bandwidth

Basic SoC block diagram

Performance Architecture – Performance factors

47

The Spectrum of Architectures

Synthesis

Compilation

Custom ASIC

FPGA Polymorphic Computing Architectures

Fixed + Variable ISA

Microprocessor

Hardware Development

Tiled architectures

Software Development

Customization fully in Hardware

Customization fullyin Software

Design NRE Effort

Decreasing Customization Increasing NRE and Time to Market

Structured ASIC

Tensilica Stretch Inc.

PACT, PICOChipLSI Logic Leopard Logic

MONARCHSM, RAW, TRIPS

Xilinx Altera

48

Interlocking Trade-offs

Power

Memory

Frequency

ILPbandwidth

dynamic power

dyna

mic

pen

altie

s

leak

age

pow

er

49

Multi-core Architecture Drivers

n Addressing ILP limitsn Multiple threadsn Coarse grain parallelism à raise the level of abstraction

n Addressing Frequency and Power limitsn Multiple slower cores across technology generationn Scaling via increasing the number of cores rather than frequencyn Heterogeneous cores for improved power/performance

n Addressing memory system limitsn Deep, distributed, cache hierarchies n OS replication à shared memory remains dominant

n Addressing manufacturing issuesn Design and verification costsà Replication à the network becomes more important!

50

3D IC (System-in-Package) is Next Revolution

51

CMOS

Memory

RF

MEMS

Photonics

Better Performance

Smaller Size

Lower Cost

Ø Massive BandwidthØ Reduced Interconnect DelaysØ Power ReductionØ Higher Functionality/SpaceØ Heterogeneous Integration

Ø 3D Maximizes Space Utilization

Ø Lower Cost vs. Next-gen DeviceØ Reuse of Proven SIP

52

Traditional IC Design Flow

Conventional Design Flow: Circular (Gajski’s) Y-Chart

53

BehaviorDomain

StructureDomain

Physical Domain

processors

ALU’s, RAM, etc.

Gates, Flip-flops, etc.Transistors

Systems

AlgorithmsRegister transfers

Logic

Transfer functions

Transistor Layout

Cell Layout

Module Layout

Floorplans

Physical partitions

Algorithm & System Design

Structural & Logic Design

Transistor-Level Design

Layout Design

• Top à Down

System Level Design/Simulation

Behavioral Level Design / Simulation

Register Transfer Level (RTL) Design/Simulation

Logic Synthesis

Logic Level Design/Simulation

Post-Layout Verification

Layout Design Switch Level

Gate Level

+

Fron

t-En

d De

sign

Post

-End

Desig

n

Conventional Design Flow: Digital (VLSI) System

54

• Bottom à UP

System Integration Simulation

Architecture Decision

Function Block Design

Circuit Structure Design/ Simulation

Transistor/Component Selection


Layout Design

Fron

t-En

d De

sign

Post

-End

Desig

n

Conventional Design Flow: Analog/RF System

55

System Design/Simulation

Architecture Decision

Function Block Design/Simulation

Circuit Structure Design/Simulation

Transistor/Component Selection


Layout Design

Fron

t-En

d De

sign

Post

-End

Desig

n•Not really for “performance” prediction but for “function” prediction!

Mixed-Signal Top-Down Design Flow

56

•Using a Mixed-Signal Simulator

A Complete Top-Down Design Methodology:

System Simulation

Digital Blocks Analog Blocks(partition)

RTL Design

Synthesis

Gate Netlist

Place & RoutingLayout Integration

Block Design

Circuit Design

Layout Design

Mixed-SignalSimulator

57

Test Generation

Function Verification Timing Verification

Simulation Floorplanning

Logic PartitioningDie Planning

LogicSynthesis

Logic Design andSimulation

Behavioral Level Design

Global Placement

Detail Placement

Clock Tree Synthesisand Routing

Global Routing

Detail Routing

Power/Ground Stripes, Rings Routing

Extraction and Delay Calc. Timing Verification

LVSDRCERC

IO Pad Placement

Traditional Taxonomy

Front End

Back End

58

Levels of VLSI Design in a Traditional Flown Specification

n what the system is supposed to do

n Architecturen high-level design of component

n state definedn logic partitioned into major blocks

n Logicn gates, f/f, and the connections between them

n Circuitn transistor circuits to realize logic elements

n Devicen behavior of individual circuit elements

n Layoutn geometry used to define and connect circuit

elements

n Processn steps used to define circuit elements

High Level Synthesis

GDSII

Synthesis

Placement

Routing

Extraction and Timing Verification

Manufacturing

Architecture Design

Verification

RTL

59

High-Level Synthesis (Behavior à RTL)

n Scheduling n Assignment of each operation to a time slot corresponding to a clock cycle or time

interval

n Resource allocation n Selection of the types of hardware components and the number for each type to be

included in the final implementation

n Module binding n Assignment of operation to the allocated hardware components

n Controller synthesis n Design of control style and clocking scheme

n Compilation n of the input specification language to the internal representation

n Parallelism extraction n usually via data flow analysis techniques

n …60

Architecture Level Floorplanning

n Defines the basic chip layout architecturen Define the standard cell rows and I/O placement locationsn Place RAMs and other macrosn Separate gate array, memory, analog, RF blocksn Define power distribution structures such as rings and stripesn Allow space for clock, major buses, etc.

n Rules of thumb for cell density are used to initially calculate design size

61

Logic Synthesis

n Conversion of RTL to gate-level netlistn Targeted to a foundry-specific libraryn Can be performed hierarchically (block by block)

n Timing-drivenn Clock informationn Primary input arrival times, primary output required timesn Input driving cells, output loadingn False paths, multi-cycle paths

n Interconnect delay may be calculated based on a “wireload model” which uses fanout to estimate delay

n Clock parameters (insertion delay, skew, jitter, etc.) are assumed to be attainable later in place and route

62

Formal Verification

n RTL description and gate level netlist are compared to verify functional equivalence, thereby verifying the synthesis results

n Formal methodsn Graph isomorphismn Binary Decision Diagram (BDD)

n Emerging technology that supplements the more traditional gate-level simulation approach

n FV also performed after place-and-route (if gate netlist changes)

63

RTL Simulation

n RTL code, written in Verilog, VHDL or a combination of both, is simulated to verify functional correctness

n Testbenches apply input stimulus to the design

n Several methods are used to verify the outputsn Self-checking testbenches automatically verify output

correctness and report mismatchesn Results can be stored in a file and compared to previous resultsn Waveform displays can be used to interactively verify the

outputs

64

Gate-Level Simulation

n Covers both functionality and timing

n Correctness is only as good as the test vectors used

n Especially critical for non-synchronous designs, verification of false path and multi-cycle path constraints

n Cell timing is included in the simulation models and interconnect delay is passed from the synthesis run

n Worst case PVT conditions are used to analyze for setup violations, and best case PVT conditions are used to analyze for hold violations

n PVT = Process, Voltage, Temperature

65

Static Timing Analysis

n Verifies that design operates at desired frequency n Implicitly assumes correct timing constraints (!), e.g., boundary conditions

n Timing constraints are similar to those used by logic synthesis

n Verifies setup and hold times at FF inputs; can also check timing from and to PI’s and PO’s; can also check point-to-point delay values (with blocking of pins, etc.)

n As with gate-level simulation, both best- and worst-case analysis is performed

n Typically performed on full-chip (not block) basisn May require modified constraints for inter-block issues: multiple clock domains, multi-

cycle paths, etc.

n For compatibility with timing-driven layout flow, helps to have simple / single set of constraints

n Other issues: incremental analysis, …

66

Fnl. RTL Design

Synthesis

Clock distribution

Design Specs

Lib.+CWLMConstraints

Route, scan re-order

Timing analysis, IPO

ERC, DRC, LVS

Tape-out

Fnl., pwr., SI ECO

Reqmts.

Floorplan & PGLib.+CWLM

Placement

• Architectural optimization (timing)• Inter-group buses, bandwidth• Clock, SI, test; validation

• Row definitions• Placement of cells• Congestion analysis

• Full RC back-annotation• Hierarchical timing, electrical and SI analysis

and IPO/ECO

• Floorplanning and custom WLM• Power distribution (Internal, I/O)• I/O driver, padring design• Board-level timing, SI

• Placement-based re-synthesis• Noise minimization, isolation • Clock distribution

• Full routing• Scan stitching, re-ordering

Physical re-synth

A More Detailed Design Flow

A. Khan, Simplex/Altius

67

68

SoC Design

System on a board

System on a Chip

Paradigm Shift in SoC Design

69

Evolutionary Problems

n Emerging new technologies:n Greater complexityn Increased performancen Higher densityn Lower power dissipation

n Key Challengesn Improve productivityn HW/SW codesignn Integration of analog & RF Ipsn Improved DFT

n Evolutionary techniques:n IP (Intellectual Property) based

designn Platform-based design

70

Migration from ASICs to SoCs

n ASICs are logic chips designed by end customers to perform a specific function for a desired applications.

n ASIC vendors supply libraries for each technology they provide. In most cases, these libraries contain predesigned and pre-verified logic circuits.

n ASIC technologies are:n gate array n standard cell n full custom

71

Migration from ASICs to SoCs

n In the mid-1990s, ASIC technology evolved from a chip-set philosophy to an embedded-cores-based system-on-a-chip concept.

n An SoC is an IC designed by stitching together multiple stand-alone VLSI designs to provide full functionality for an application.

n An SoC compose of predesigned models of complex functions known as cores (terms such as intellectual property block, virtual components, and macros) that serve a variety of applications.

72

SoC Design Challenges

n Why does it take longer to design SoCs compared to traditional ASICs?

n We must examine factors influencing the degree of difficulty and Turn Around Time (TAT) (the time taken from gate-level netlist to metal mask-ready stage) for designing ASICs and SoCs.

n For an ASIC, the following factors influence TAT:n Frequency of the designn Number of clock domainsn Number of gatesn Densityn Number of blocks and sub-blocks

n The key factor that influences TAT for SoCs is system integration (integrating different silicon IPs on the same IC).

73

SoCs vs. ASICs

n SoC is not just a large ASICn Architectural approach involving significant design reusen Addresses the cost and time-to-market problems

n SoC methodology is an incremental step over ASIC methodology

n SoC design is significantly more complexn Need cross-domain optimizationsn IP reuse and Platform-based design increase productivity, but not enoughn Even with extensive IP reuse, many of the ASICs design problems remain,

plus many more ...n Productivity increase far from closing design gap

74

The challenge for designers is not whether to adopt reuse, but how to employ it effectively.

Design for Reuse

n To overcome the design gap, design reuse - the use of pre-designed and pre-verified cores, or reuse of the existing designs becomes a vital concept in design methodology.

n An effective block-based design methodology requires an extensive library of reusable blocks, or macros, and it is based on the following principles:

n The macro must be extremely easy to integrate into the overall chip design.n The macro must be so robust that the integrator has to perform essentially

no functional verification of internals of the macro.

75

Design for Reuse

n To be fully reusable, the hardware macro must be:n Designed to solve a general problem

n easily configurable to fit different applications.n Designed for use in multiple technologies

n For soft macros, this mean that the synthesis scripts must produce satisfactory quality of results with a variety of libraries. For hard macros, this means having an effective porting strategy for mapping the macro onto new technologies.

n Designed for simulation with a variety of simulators n Good design reuse practices dictate that both a Verilog and VHDL version of each

model and verification testbench should be available, and they should work with all the major commercial simulators.

n Designed with standards-based interfacesn Unique or custom interfaces should be used only if no standards-based interface

exists.

76

Design for Reuse – cont.

n To be fully reusable, the hardware macro must be:n Verified independently of the chip in which it will be used

n Often, macros are designed and only partially tested before being integrated into a chip for verification. Reusable designs must have full, stand-alone testbenches and verification suites that afford very high levels of test coverage.

n Verified to a high level of confidencen This usually means very rigorous verification as well as building a physical

prototype that is tested in an actual system running real software.n Fully documented in terms of appropriate applications and restrictions

n In particular, valid configurations and parameter values must be documented. Any restrictions on configurations or parameter values must be clearly stated. Interfacing requirements and restrictions on how the macro can be used must be documented.

77

Resources vs. Number of Uses

Intellectual Property

n Utilizing the predesigned modules enables:

n To avoid reinventing the wheel for every new product,

n To accelerate the development of new products,

n To assemble various blocks of a large ASIC/SoC quite rapidly,

n To reduce the possibility of failure based on design and verification of a block for the first time.

n These predesigned modules are commonly called Intellectual Property (IP) cores or Virtual Components (VC).

78

Intellectual Property Categories

n IP cores are classified into three distinct categories:n Hard IP cores consist of hard layouts using particular physical design libraries

and are delivered in masked-level designed blocks (GDSII format). The integration of hard IP cores is quite simple, but hard cores are technology dependent and provide minimum flexibility and portability in reconfiguration and integration.

n Soft IP cores are delivered as RTL VHDL/Verilog code to provide functional descriptions of IPs. These cores offer maximum flexibility and reconfigurability to match the requirements of a specific design application, but they must be synthesized, optimized, and verified by their user before integration into designs.

n Firm IP cores bring the best of both worlds and balance the high performance and optimization properties of hard IPs with the flexibility of soft IPs.These cores are delivered in form of targeted netlists to specific physical libraries after going through synthesis without performing the physical layout.

79

Reusability portabilityflexibility

Predictability, performance, time to market

Softcore

Firmcore

Hardcore

Trade-offs among Soft, Firm, and Hard cores

80

IP Format Representation Optimization Technology Reusability

Hard GDSII Very High Technology Dependent Low

Soft RTL Low Technology Independent Very High

Firm Target Netlist High Technology Generic High

Comparison of Different IP Formats

81

The Design Process of SoCs

n SoC designs are made possible by deep submicron technology. This technology presents a whole set of design challenges including:

n Interconnect delays, n Clock and power distribution, and n Placement and routing of millions of gates.

n These physical design problems can have a significant impact on the functional design of SoCs and on the design process itself.

n The first step in system design is specifying the required functionality.

n The second step is to transform the system funcionality into an architecture which define the system implementation by specifying the number and types of components and connections between them.

82

Define Hardware-Software Codesign

n Hardware-Software Codesign is the concurrent and co-operative design of hardware and software components ofa system.

n The SoC design process is a hardware-software codesign in which design productivity is achived by design reuse.

n The design process is the set of design tasks that transform an abstract specification model into an architectural model.

83

SoC Co-design Flow

84

Rapid design space exploration

Quality tool-kit generation

Design Reuse

Design Specification

HWVerilog, VHDL

SWC++

HW/SW Partitioning

Synthesis Complier

EstimatorArchitecture Description Language

Verification

Co-verification

IP Library

M1

P2

P1

On-chip Memory

Processor core

Synthesized HW

Interface

Off-chip Memory

A canonical or generic form of an SoC design

These chips have:• one (several) processors• large amounts of memory • bus-based architectures • peripherals • coprocessors• and I/O channels

Design Process

85

Top Level Design

Unit Block Design

Integration and SynthesisTrial Netlists

System Level Verification

Timing Convergence& Verification

Fabrication

DVT

DVT Prep

4 14 5 4

Time in WeeksTime to Mask order24

33

Unit Block Verification

4 2

SoC Typical Design Steps

n With increasing Complexity of IC’s and decreasing Geometry, IC Vendor steps of Placement, Layout and Fabrication are unlikely to be greatly reduced.

n In fact there is a greater risk that Timing Convergence steps will involve more iteration.

n Need to reduce time before Vendor Steps.

n Need to consider Layout issues up-front.

86

ø DVT: Design Validation Test

SoC Typical Design Steps

n SoC Architecture already defined. Flexible to scale in frequency and complexity. Allows new IP cores, new technology to be integrated.

n Separate the design of the reusable IP from the design of the SoC. Build the SoC from library of tested IP.

n Unit design consists only of any additional core features or wrapping new IP to enable integration.

n Reusable IP purchased from external sources, developed from in-house designs or designed as separate project off critical SoC development path.

Top Level Design

Unit Block Design

Integration and SynthesisTrial Netlists

System Level Verification

Timing Convergence& Verification

Fabrication

DVT

DVT Prep

4 14 5 4

Time in WeeksTime to Mask order24

33

Unit Block Verification

4 2

87

SoC Methodology

88

SoC Methodology Evolving ...

89

How to Design an SoC

90


91


92


93


94

I/O pads

I/Opa

ds

I/Opads

1149.1 TAP controller

User

-def

ined

logi

c

CPUcore

Self-testcontrol

Legacycore

IP hardcore

DSPcore

Memoryarray

Interfacecontrol

EmbeddedDRAM

Main SoC testing challenges

• Core level test: Embedded cores are tested as a part of the system

• Test access: Due to absence of physical access to the core peripheries, electronic access mechanism required

• SoC level test: SoC test is a single composite test including individual core, and UDL(User-Defined Logic) test and test scheduling

Test data volume for core-based SoC designs is very high.

• New techniques are required to reduce testing time, test cost, and the memory requirements of the automatic test equipment (ATE)

• SoCs are complex designs combining logic, memory and mixed-signal circuits in a single IC

System on Chip - Testing

95

Summary

n An System on Chip (SoC) is an integrated circuit that implements most or all of the function of a complete electronic system.

n Four vital areas of SoC:n Higher levels of abstractionn IP and platform re-usen IP creation – ASIPs, interconnect and algorithmn Earlier software development and integration

96

Documents

ECE5917 SoC Architecture: Introductioncontents.kocw.net/KOCW/document/2014/sungkyunkwan/hanta... · 2016. 9. 9. · SoC Architecture: Introduction Tae Hee Han: [email protected] Semiconductor