Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
ECE5917SoC Architecture: Introduction
Tae Hee Han: [email protected]
Semiconductor Systems Engineering
Sungkyunkwan University
Course Information
n Objectivesn Aiming at educating competitive SoC engineers having System concept and
Market understanding
n Lecture Schedulen Wed. 13:30 ~ 16:15 PM
n References for this coursen Liming Xiu, VLSI Circuit Design Methodology Demystified, Wiley Inter-Science,
2008n Hennessy & Patterson, Computer Architecture 5th ed, Morgan Kaufmann,
2011n ARM Processor architecture, AMBA Bus manualn Jacob et al., Memory Systems, Morgan Kaufmann, 2008
2
Course Schedule
Schedule Contents Remarks
Week 1 Basic Concept, Introduction
Week 2 Case study: System
Week 3 Embedded System: Hardware/Software InterfaceWeek 4~5 ARM Processor and AMBA Bus System
Week 6~7 Memory and Peripheral Interface
Week 8 Midterm Exam.
Week 9~10 MP SoC
Week 11 Poster Presentation
Week 12~14 On Chip Network
Week 15 Poster Presentation
Week 16 Final Exam.
3
Grading System
n Homework: 20%
n Attendance: 10%
n Midterm: 25%
n Final: 25%
n Poster(10%) + Presentation(10%): 20%
4
Outline
n Historical Perspective of IC and Issues
n What is SoC?
n Traditional Design Flow
n SoC Design
5
6
IC: Historical Background & Issues
The Invention of Transistor
n John Bardeen, Walter Brattain & William Shockley invented “The first transistor” in 1948 (Bipolar Transistor)
7
The invention of Integrated Circuit
n Jack Kilby & Robert Noyce inveted “The Integrated Circuit” in 1958.
8
Jack S. KilbyWinner of the 2000 Nobel Prize
Connect 2 bipolar transistors in theSame substrate by bonding wire.
Moore’s Law (1965)
n The single most important guideline in microprocessor fabrication and architecture
1. "the number of transistors per chip will double every 12 24 18 months“
2. "as the sophistication of chips goes up, the cost of [fabrication plants] goes up exponentially"
n cost-integration relation
n Both are held true after four decades.
9
(http://download.intel.com/museum/research/arc_collect/history_docs/pix/hoff1.jpg)
Gordon Moore Original graph from 1965 (source: www.intel.com)
Rela
tive
Man
ufac
turin
g Co
st/C
ompo
nent
Number of Components Per Integrated Circuit
MOS Transistor Scaling (1974 to present)
S=0.7 [0.5´ per 2 nodes]
Source: 2001 ITRS - Exec. Summary, ORTC Figure
(TypicalMPU/ASIC)
Poly Pitch
(TypicalDRAM)
Metal Pitch
§ Decreased transistor/feature sizes è
§ Increased variability (tox, BEOL, DFM, SEU, etc.)
§ Short channel effect, leakage power
ø BEOL: Back-End-Of-The-Lineø DFM: Design for Manufacturabilityø SEU: Single Event Upset
10
Scaling - FEOL, BEOL
11
øFEOL: Front end of LineøBEOL: Back end of Line
Ecosystem of Integrated Circuits
12
Performance, Cost and Power
13
Source: GSA (Nov. 2012)
Performance is a lasting theme Reducing cost while keeping performance
Performance Cost
Power
Reducing power consumptionwhile keeping performance and cost
State-of-arts Design skills
Trade-off Compromise
Declining Designs
14Source: IBS (2012)
Cost of New Fab. Increases Dramatically
Source: GSA (Nov. 2012)
15
No Cost-effective Lithography Solutions
n High cost and availability of production EUV
n Integration of EUV and FinFET technology on 450mm wafers
n Expected to drive 3D integration to have major impact on extending Moore’s law
Source: IBM (2013)
45nm 32nm 22nm 14nm 10nm
Immersion (ArFi) 2nd Generation Immersion
3rd Gen ArFi w/ Source Mask Optimization
(SMO)
4th Gen ArFi w/SMO & Double
Patterning (DPL)
5th Gen ArFi w/ Multilayer
Patterning or EUV
16
Moore’s Law & More
Analog/RF
HVPower Passives Sensors
Actuators Biochips
Scali
ng (
More
Moo
re)
Functional Diversification (More than Moore)
[sca
ling]
130nm
90nm
65nm
45nm
32nm
22nm
16nm
Source: JSTC, adapted from ITRS 2011
MULTICHIP
MULTICOMPONENT IC
System-on-chip(SoC)
System-in-package(SiP)
17
Beyond CMOS
Technology Scaling
n 30% scaling down in dimensions à doubles transistor density
n Power per transistor n Vdd scaling à lower power
n Transistor delay = Cgate Vdd / ISATn Cgate , Vdd scaling à lower delay
GATE
SOURCE
BODY
DRAIN
tox
GATE
SOURCE DRAIN
L
leakddstdddd IVIVfCVP ++= 2a
18
Fundamental Trends
High Volume Manufacturing 2004 2006 2008 2010 2012 2014 2016 2018
Technology Node (nm) 90 65 45 32 22 16 11 8
Integration Capacity (BT) 2 4 8 16 32 64 128 256
Delay = CV/I scaling 0.7 ~0.7 >0.7 Delay scaling will slow down
Energy/Logic Op scaling >0.35 >0.5 >0.5 Energy scaling will slow down
Bulk Planar CMOS High Probability Low Probability
Alternate, 3G etc Low Probability High Probability
Variability Medium High Very High
ILD (K) ~3 <3 Reduce slowly towards 2-2.5
RC Delay 1 1 1 1 1 1 1 1
Metal Layers 6-7 7-8 8-9 0.5 to 1 layer per generation
Source: Shekhar Borkar, Intel Corp.
19
2008 ITRS “Beyond CMOS” Definition Graphic
20
Computing and Data Storage Beyond CMOS
Source: Emerging Research Device Working Group
“More Moore” “Beyond CMOS”
22nm 16nm 11nm 8nm
BaselineCMOS
Ultimately Scaled CMOS
FunctionallyEnhanced CMOS
Spin LogicDevices
NanowireElectronics
FerromagneticLogic Devices
32nm
Channel Replacement Materials Low Dimensional Materials Channels
Multiple gate MOSFETs New State Variable
New Data RepresentationNew Devices
New Data Processing Algorithms
22/20 nm 15/11 nm 8 nm & Beyond
Conventional Planar Device
FINFET
ETSOISi Nano-Wire
C Electronics
Fully Depleted Devices
Si NW
HfO2
Deposited Si
Device Structure Research Pipeline
Innovation and Disruptive Technology at Each Node
Source: IBM (2011)
21
What would be the Limit of Downsizing!
22
3nm
ChannelSource Drain
Tunneling distance
Source: Hiroshi Iwai (Tokyo Institute of Technology, 2013)
Impact of Moore’s Law To Date
Push the Memory Wall à Larger caches
Increase Frequency àDeeper Pipelines
Increase ILP àConcurrent Threads,
Branch Prediction and SMT
Manage Power àclock gating, activity
minimization
IBM Power5
Source: IBM
23
Shaping Future Multicore Architectures
n The ILP Walln Limited ILP in applications
n The Frequency Walln Not much headroom
n The Power Walln Dynamic and static power dissipation
n The Memory Walln Gap between compute bandwidth and memory bandwidth
n Manufacturingn Non recurring engineering costsn Time to market
24
The Frequency Wall
n Not much headroom left in the stage to stage times (currently 8-12 FO4 delays)
n Increasing frequency leads to the power wall
Vikas Agarwal, M. S. Hrishikesh, Stephen W. Keckler, Doug Burger. Clock rate versus IPC: the end of the road for conventional microarchitectures. In ISCA 2000
25
Options
n Increase performance via parallelismn On chip this has been largely at the instruction/data level
n The 1990’s through 2005 was the era of instruction level parallelism
n Single instruction multiple data/Vector parallelismn MMX, SSIMD, Vector Co-Processors
n Out Of Order (OOO) execution coresn Explicitly Parallel Instruction Computing (EPIC)
n Have we exhausted options in a thread?
26
The ILP Wall - Past the Knee of the Curve?
“Effort”
Performance
ScalarIn-Order
Moderate-PipeSuperscalar/OOO
Very-Deep-PipeAggressive
Superscalar/OOO
Made sense to goSuperscalar/OOO:
good ROI
Very little gain forsubstantial effort
Source: G. Loh
27
The ILP Wall
n Limiting phenomena for ILP extraction:n Clock rate: at the wall each increase in clock rate has a corresponding CPI
increase (branches, other hazards)n Instruction fetch and decode: at the wall more instructions cannot be fetched
and decoded per clock cyclen Cache hit rate: poor locality can limit ILP and it adversely affects memory
bandwidthn ILP in applications: serial fraction on applications
n Reality:n Limit studies cap IPC at 100-400 (using ideal processor)n Current processors have IPC of only 1-2
28
The ILP Wall: Options
n Increase granularity of parallelismn Simultaneous Multi-threading to exploit TLP
n TLP has to exist à otherwise poor utilization resultsn Coarse grain multithreading n Throughput computing
n New languages/applicationsn Data intensive computing in the enterprisen Media rich applications
29
The Memory Wall
µProc60%/yr.
DRAM7%/yr.
1
10
100
1000
DRAM
CPU
Processor-MemoryPerformance Gap:(grows 50% / year)
Time
“Moore’s Law”
30
The Memory Wall
n Increasing the number of cores increases the demanded memory bandwidth
n What architectural techniques can meet this demand?
31
Average access time
Year?
The Memory Wall
n On die caches are both area intensive and power intensiven StrongARM dissipates more than 43% power in cachesn Caches incur huge area costs
n Larger caches never deliver the near-universal performance boost offered by frequency ramping (Source: Intel)
CPU0 CPU1
AMD Dual-Core Athlon FXIBM Power5
32
The Power Wall
n Power per transistor scales with frequency but also scales with Vdd
n Lower Vdd can be compensated for with increased pipelining to keep throughput constant
n Power per transistor is not same as power per area à power density is the problem!
n Multiple units can be run at lower frequencies to keep throughput constant, while saving power
leakddstdddd IVIVfCVP ++= 2a
33
Improving Power/Performance
n Consider constant die size and decreasing core area each generation = more cores/chip
n Effect of lowering voltage and frequency à power reductionn Increasing cores/chip à performance increase
leakddstdddd IVIVfCVP ++= 2a
Better power performance!
34
Special Purpose Hardware (a.k.a Accelerator)
2.23 mm X 3.54 mm, 260K transistors
Opportunities: Network processing enginesMPEG Encode/Decode engines, Speech engines
TCP/IP Offload Engine (TOE)
Source: Shekhar Borkar, Intel Corp.Special Purpose HW à Best MIPS/Watt
35
Moore’s Law reinterpreted
n Number of cores per chip can double every two years
n Clock speed will not increase (possibly decrease)
n Need to deal with systems with millions of concurrent threads
n Need to deal with inter-chip parallelism as well as intra-chip parallelism
36
The Economics of Manufacturing
n Where are the costs of developing the next generation processors?
n Design Costsn Manufacturing Costs
n What type of chip level solutions is the economics implying?
n Assessing the implications of Moore’s Law is an exercise in mass production
37
Valued Performance: SoC (System-on-a-Chip)
n Special-purpose hardware è more MIPS/mm2
38
Die Area Power Performance
General Purpose 2 ´ 2 ´ ~1.4 ´
Multimedia Kernels < 10% < 10% 1.5 ~ 4 ´
Software Virtualization - Disruptive Force for SoC Design
Traditional Design Cycle for a Typical SoC
CustomerEvaluation
IC Design Manufacturing ReferenceDesign
ProductionDesignWin
DesignTestPackage
Samples
Test & QualificationIC ValidationEngineering Samples
Firmware Development
Driver Development
Production, Software Development
Production SamplesDriver Development
12 Months 24 Months
“Virtual” Design Cycle for a Typical SoC
CustomerEvaluation
Virtual Design Manufacturing ProductionDesign
Win
DesignCustomerApplication Development
PrototypeEvaluation
Testing & QualificationIC ValidationEngineering Samples
Software DevelopmentProduction SamplesDriver Development
12 Months 24 Months
Benefits of the Virtual design cycle:
ØReduced time to market
ØReduced risk
ØMore effective collaboration between the IC vendor and customer
ØKey enabler for SoC startups
Source: Gartner
39
40
What is SoC?
What is SoC ?
n An SoC is a system on an IC that integrates software and hardware Intellectual Property (IP) using more than one design methodology for the purpose of defining the functionality and behavior of the proposed system.
n The designed system is application specific.
n Typical applications of SoC:n consumer devices, n networking, n communications, and n other segments of the electronics
industry.
mp memory
video unitgraphics
coms DSP custom
software
mp
41
CPU
DSP
Ip-Sec
mem
X
USBhub
mem
CPU DSP USBhub
Ip-Sec
X
Proc
Co-Proc
IP cores
Typical : $10
Up to now : collection of chips
Now : collection of cores
Typical : $70
Typical approach :
Define requirements
Design with off-the shelf chips
- at 0.5 year mark : first prototypes
- 1 year : ship with low margins/loss
start ASIC integration
- 2 years : ASIC-based prototypes
- 2.5 years : ship, make profits (with competition)
With SoC
Define requirements
Design with off-the shelf cores
- at 0.5 year mark : first prototypes
- 1 year : ship with high margin and market share
System on Chip Benefits
42
SoC Architecture
n Hardware Architecturen CPU, Hardware IPn Diverse memory elementsn I/O interfacen Bus
n SoC Complexity is increasingn # processing elements growsn Communication architecture
n Switched busn NoC (Network on Chip)
Intrinsix AMBA SoC Platform, Intrinsix Co.http://www.intrinsix.com/intrinsix-ip/soc-ip/amba.htm
43
Chip Design is Now System Design
44
Timing DrivenDesign
Block BasedDesign
Platform BasedDesign
Plug and PlaySystem on a Chip
Complex ASICwith a Few IPsASIC on DSM
moving into mainstream
Logic
mP CORE SRAMROM
Logic
Soft IF/ IP
mP CORESRAM
ROM
DSP ROM
MPEG RAM
cache
LOGIC
I/F
Mobile SoC
PMIC+
AudioCodec
Basic SoC Architecture – System example
Main TFT LCD & TSP
GPS Debug
UART0(CTS/RTS)
UART2/IrDA V1.1
USB Host 2.0
USB OTG 2.0
IIS(3-ch)
HS-MMC/SD/SDIO
(3-ch)
IIC/PWITFT LCDC
SMC
NAND Flash I/F
SRAM/ROM//NOR
SLC/MLC NAND
HDMI
Video Codec
AC97(1-ch)/PCM (2-ch)
Key pad(16x8)
HS-SPI(3-ch)
Modem I/FModem
2D engine
3D engine
JPEG Codec
12MP ISP
QWERTY Key
System PowerLi-Ion
DRAMC0 (333)mDDRLPDDR1LPDDR2
UART1(CTS/RTS)
DVB-H/TDMB/WiFi
Wibro/WiFi
Smartphone system configuration
45
UART
Connectivity
USB
Multimedia Acceleration
Camera IF
Video Codec
Graphics Engine
´ 64 Multi - Layer AHB/AXI Bus
Mobile SDRAM/Mobile DDR SDRAM
LPDDR1/LPDDR2
Memory Subsystem
PowerManagement
TFT LCDController
w/DSI
SRAM/ROM/NOR
CPU Core
Core & L1 cache
IrDA
I2S
I2C
GPIO
HS-MMC/SD
LCDC/OSD
Dynamic VoltageFrequency Scaling
Color-TFT LCD
PLL
RTCSystem Peripheral
Timer w/ PWM
Watch Dog Timer
DMA
Keypad
ADC & Touch Screen
TVout
SPI
Modem I/F
ATA
CryptoAccelerator
SecureROM
SecureRAM
OneDRAM
JPEG Codec
NAND Flash
AC97 / PCM
L2 Cache
CPU choosing & Cache structure decision
ISP spec decision & architecture design
MFC optimization & architecture design
Visual system architecture design
Memory system design
Memory controller optimization
File system optimization
Security system architecture design
Communication system performance optimization
Low power Audio play architecture design
System low power architecture design
System bus architecture design
Physical design balancing
Clock & reset architecture design
Basic SoC Architecture – Issues in SoC
46
Basic performance factors : Core performance, memory latency, bandwidth
CPU Multimedia
Multilevel interconnect bus
DRAM Peripheral &Other memories
CPU subsystem
• Cortex-A15 / Cortex-A7• L2 Cache• Enhancing CPU clock speed
Core performance
Multimedia subsystem
• Logic parallelism
• Prefetching
• Optimizing access pattern• Buffering & caching
Core performance
latency
bandwidth
Memory subsystem (esp, DRAM)
• Reducing controller latency
• Advanced precharge scheme• Optimizing scheduling scheme• Optimizing memory structure for video• Enhancing I/O Speed
latency
bandwidth
Bus subsystem
• Reducing arbitration delay• Reducing interconnect latency• Enhancing bus clock speed
• Optimizing arbitration scheme
latency
bandwidth
Basic SoC block diagram
Performance Architecture – Performance factors
47
The Spectrum of Architectures
Synthesis
Compilation
Custom ASIC
FPGA Polymorphic Computing Architectures
Fixed + Variable ISA
Microprocessor
Hardware Development
Tiled architectures
Software Development
Customization fully in Hardware
Customization fullyin Software
Design NRE Effort
Decreasing Customization Increasing NRE and Time to Market
Structured ASIC
Tensilica Stretch Inc.
PACT, PICOChipLSI Logic Leopard Logic
MONARCHSM, RAW, TRIPS
Xilinx Altera
48
Interlocking Trade-offs
Power
Memory
Frequency
ILPbandwidth
dynamic power
dyna
mic
pen
altie
s
leak
age
pow
er
49
Multi-core Architecture Drivers
n Addressing ILP limitsn Multiple threadsn Coarse grain parallelism à raise the level of abstraction
n Addressing Frequency and Power limitsn Multiple slower cores across technology generationn Scaling via increasing the number of cores rather than frequencyn Heterogeneous cores for improved power/performance
n Addressing memory system limitsn Deep, distributed, cache hierarchies n OS replication à shared memory remains dominant
n Addressing manufacturing issuesn Design and verification costsà Replication à the network becomes more important!
50
3D IC (System-in-Package) is Next Revolution
51
CMOS
Memory
RF
MEMS
Photonics
Better Performance
Smaller Size
Lower Cost
Ø Massive BandwidthØ Reduced Interconnect DelaysØ Power ReductionØ Higher Functionality/SpaceØ Heterogeneous Integration
Ø 3D Maximizes Space Utilization
Ø Lower Cost vs. Next-gen DeviceØ Reuse of Proven SIP
52
Traditional IC Design Flow
Conventional Design Flow: Circular (Gajski’s) Y-Chart
53
BehaviorDomain
StructureDomain
Physical Domain
processors
ALU’s, RAM, etc.
Gates, Flip-flops, etc.Transistors
Systems
AlgorithmsRegister transfers
Logic
Transfer functions
Transistor Layout
Cell Layout
Module Layout
Floorplans
Physical partitions
Algorithm & System Design
Structural & Logic Design
Transistor-Level Design
Layout Design
• Top à Down
System Level Design/Simulation
Behavioral Level Design / Simulation
Register Transfer Level (RTL) Design/Simulation
Logic Synthesis
Logic Level Design/Simulation
Post-Layout Verification
Layout Design Switch Level
Gate Level
+
Fron
t-En
d De
sign
Post
-End
Desig
n
Conventional Design Flow: Digital (VLSI) System
54
• Bottom à UP
System Integration Simulation
Architecture Decision
Function Block Design
Circuit Structure Design/ Simulation
Transistor/Component Selection
Post-Layout Verification
Layout Design
Fron
t-En
d De
sign
Post
-End
Desig
n
Conventional Design Flow: Analog/RF System
55
System Design/Simulation
Architecture Decision
Function Block Design/Simulation
Circuit Structure Design/Simulation
Transistor/Component Selection
Post-Layout Verification
Layout Design
Fron
t-En
d De
sign
Post
-End
Desig
n•Not really for “performance” prediction but for “function” prediction!
Mixed-Signal Top-Down Design Flow
56
•Using a Mixed-Signal Simulator
A Complete Top-Down Design Methodology:
System Simulation
Digital Blocks Analog Blocks(partition)
RTL Design
Synthesis
Gate Netlist
Place & RoutingLayout Integration
Block Design
Circuit Design
Layout Design
Mixed-SignalSimulator
57
Test Generation
Function Verification Timing Verification
Simulation Floorplanning
Logic PartitioningDie Planning
LogicSynthesis
Logic Design andSimulation
Behavioral Level Design
Global Placement
Detail Placement
Clock Tree Synthesisand Routing
Global Routing
Detail Routing
Power/Ground Stripes, Rings Routing
Extraction and Delay Calc. Timing Verification
LVSDRCERC
IO Pad Placement
Traditional Taxonomy
Front End
Back End
58
Levels of VLSI Design in a Traditional Flown Specification
n what the system is supposed to do
n Architecturen high-level design of component
n state definedn logic partitioned into major blocks
n Logicn gates, f/f, and the connections between them
n Circuitn transistor circuits to realize logic elements
n Devicen behavior of individual circuit elements
n Layoutn geometry used to define and connect circuit
elements
n Processn steps used to define circuit elements
High Level Synthesis
GDSII
Synthesis
Placement
Routing
Extraction and Timing Verification
Manufacturing
Architecture Design
Verification
RTL
59
High-Level Synthesis (Behavior à RTL)
n Scheduling n Assignment of each operation to a time slot corresponding to a clock cycle or time
interval
n Resource allocation n Selection of the types of hardware components and the number for each type to be
included in the final implementation
n Module binding n Assignment of operation to the allocated hardware components
n Controller synthesis n Design of control style and clocking scheme
n Compilation n of the input specification language to the internal representation
n Parallelism extraction n usually via data flow analysis techniques
n …60
Architecture Level Floorplanning
n Defines the basic chip layout architecturen Define the standard cell rows and I/O placement locationsn Place RAMs and other macrosn Separate gate array, memory, analog, RF blocksn Define power distribution structures such as rings and stripesn Allow space for clock, major buses, etc.
n Rules of thumb for cell density are used to initially calculate design size
61
Logic Synthesis
n Conversion of RTL to gate-level netlistn Targeted to a foundry-specific libraryn Can be performed hierarchically (block by block)
n Timing-drivenn Clock informationn Primary input arrival times, primary output required timesn Input driving cells, output loadingn False paths, multi-cycle paths
n Interconnect delay may be calculated based on a “wireload model” which uses fanout to estimate delay
n Clock parameters (insertion delay, skew, jitter, etc.) are assumed to be attainable later in place and route
62
Formal Verification
n RTL description and gate level netlist are compared to verify functional equivalence, thereby verifying the synthesis results
n Formal methodsn Graph isomorphismn Binary Decision Diagram (BDD)
n Emerging technology that supplements the more traditional gate-level simulation approach
n FV also performed after place-and-route (if gate netlist changes)
63
RTL Simulation
n RTL code, written in Verilog, VHDL or a combination of both, is simulated to verify functional correctness
n Testbenches apply input stimulus to the design
n Several methods are used to verify the outputsn Self-checking testbenches automatically verify output
correctness and report mismatchesn Results can be stored in a file and compared to previous resultsn Waveform displays can be used to interactively verify the
outputs
64
Gate-Level Simulation
n Covers both functionality and timing
n Correctness is only as good as the test vectors used
n Especially critical for non-synchronous designs, verification of false path and multi-cycle path constraints
n Cell timing is included in the simulation models and interconnect delay is passed from the synthesis run
n Worst case PVT conditions are used to analyze for setup violations, and best case PVT conditions are used to analyze for hold violations
n PVT = Process, Voltage, Temperature
65
Static Timing Analysis
n Verifies that design operates at desired frequency n Implicitly assumes correct timing constraints (!), e.g., boundary conditions
n Timing constraints are similar to those used by logic synthesis
n Verifies setup and hold times at FF inputs; can also check timing from and to PI’s and PO’s; can also check point-to-point delay values (with blocking of pins, etc.)
n As with gate-level simulation, both best- and worst-case analysis is performed
n Typically performed on full-chip (not block) basisn May require modified constraints for inter-block issues: multiple clock domains, multi-
cycle paths, etc.
n For compatibility with timing-driven layout flow, helps to have simple / single set of constraints
n Other issues: incremental analysis, …
66
Fnl. RTL Design
Synthesis
Clock distribution
Design Specs
Lib.+CWLMConstraints
Route, scan re-order
Timing analysis, IPO
ERC, DRC, LVS
Tape-out
Fnl., pwr., SI ECO
Reqmts.
Floorplan & PGLib.+CWLM
Placement
• Architectural optimization (timing)• Inter-group buses, bandwidth• Clock, SI, test; validation
• Row definitions• Placement of cells• Congestion analysis
• Full RC back-annotation• Hierarchical timing, electrical and SI analysis
and IPO/ECO
• Floorplanning and custom WLM• Power distribution (Internal, I/O)• I/O driver, padring design• Board-level timing, SI
• Placement-based re-synthesis• Noise minimization, isolation • Clock distribution
• Full routing• Scan stitching, re-ordering
Physical re-synth
A More Detailed Design Flow
A. Khan, Simplex/Altius
67
68
SoC Design
System on a board
System on a Chip
Paradigm Shift in SoC Design
69
Evolutionary Problems
n Emerging new technologies:n Greater complexityn Increased performancen Higher densityn Lower power dissipation
n Key Challengesn Improve productivityn HW/SW codesignn Integration of analog & RF Ipsn Improved DFT
n Evolutionary techniques:n IP (Intellectual Property) based
designn Platform-based design
70
Migration from ASICs to SoCs
n ASICs are logic chips designed by end customers to perform a specific function for a desired applications.
n ASIC vendors supply libraries for each technology they provide. In most cases, these libraries contain predesigned and pre-verified logic circuits.
n ASIC technologies are:n gate array n standard cell n full custom
71
Migration from ASICs to SoCs
n In the mid-1990s, ASIC technology evolved from a chip-set philosophy to an embedded-cores-based system-on-a-chip concept.
n An SoC is an IC designed by stitching together multiple stand-alone VLSI designs to provide full functionality for an application.
n An SoC compose of predesigned models of complex functions known as cores (terms such as intellectual property block, virtual components, and macros) that serve a variety of applications.
72
SoC Design Challenges
n Why does it take longer to design SoCs compared to traditional ASICs?
n We must examine factors influencing the degree of difficulty and Turn Around Time (TAT) (the time taken from gate-level netlist to metal mask-ready stage) for designing ASICs and SoCs.
n For an ASIC, the following factors influence TAT:n Frequency of the designn Number of clock domainsn Number of gatesn Densityn Number of blocks and sub-blocks
n The key factor that influences TAT for SoCs is system integration (integrating different silicon IPs on the same IC).
73
SoCs vs. ASICs
n SoC is not just a large ASICn Architectural approach involving significant design reusen Addresses the cost and time-to-market problems
n SoC methodology is an incremental step over ASIC methodology
n SoC design is significantly more complexn Need cross-domain optimizationsn IP reuse and Platform-based design increase productivity, but not enoughn Even with extensive IP reuse, many of the ASICs design problems remain,
plus many more ...n Productivity increase far from closing design gap
74
The challenge for designers is not whether to adopt reuse, but how to employ it effectively.
Design for Reuse
n To overcome the design gap, design reuse - the use of pre-designed and pre-verified cores, or reuse of the existing designs becomes a vital concept in design methodology.
n An effective block-based design methodology requires an extensive library of reusable blocks, or macros, and it is based on the following principles:
n The macro must be extremely easy to integrate into the overall chip design.n The macro must be so robust that the integrator has to perform essentially
no functional verification of internals of the macro.
75
Design for Reuse
n To be fully reusable, the hardware macro must be:n Designed to solve a general problem
n easily configurable to fit different applications.n Designed for use in multiple technologies
n For soft macros, this mean that the synthesis scripts must produce satisfactory quality of results with a variety of libraries. For hard macros, this means having an effective porting strategy for mapping the macro onto new technologies.
n Designed for simulation with a variety of simulators n Good design reuse practices dictate that both a Verilog and VHDL version of each
model and verification testbench should be available, and they should work with all the major commercial simulators.
n Designed with standards-based interfacesn Unique or custom interfaces should be used only if no standards-based interface
exists.
76
Design for Reuse – cont.
n To be fully reusable, the hardware macro must be:n Verified independently of the chip in which it will be used
n Often, macros are designed and only partially tested before being integrated into a chip for verification. Reusable designs must have full, stand-alone testbenches and verification suites that afford very high levels of test coverage.
n Verified to a high level of confidencen This usually means very rigorous verification as well as building a physical
prototype that is tested in an actual system running real software.n Fully documented in terms of appropriate applications and restrictions
n In particular, valid configurations and parameter values must be documented. Any restrictions on configurations or parameter values must be clearly stated. Interfacing requirements and restrictions on how the macro can be used must be documented.
77
Resources vs. Number of Uses
Intellectual Property
n Utilizing the predesigned modules enables:
n To avoid reinventing the wheel for every new product,
n To accelerate the development of new products,
n To assemble various blocks of a large ASIC/SoC quite rapidly,
n To reduce the possibility of failure based on design and verification of a block for the first time.
n These predesigned modules are commonly called Intellectual Property (IP) cores or Virtual Components (VC).
78
Intellectual Property Categories
n IP cores are classified into three distinct categories:n Hard IP cores consist of hard layouts using particular physical design libraries
and are delivered in masked-level designed blocks (GDSII format). The integration of hard IP cores is quite simple, but hard cores are technology dependent and provide minimum flexibility and portability in reconfiguration and integration.
n Soft IP cores are delivered as RTL VHDL/Verilog code to provide functional descriptions of IPs. These cores offer maximum flexibility and reconfigurability to match the requirements of a specific design application, but they must be synthesized, optimized, and verified by their user before integration into designs.
n Firm IP cores bring the best of both worlds and balance the high performance and optimization properties of hard IPs with the flexibility of soft IPs.These cores are delivered in form of targeted netlists to specific physical libraries after going through synthesis without performing the physical layout.
79
Reusability portabilityflexibility
Predictability, performance, time to market
Softcore
Firmcore
Hardcore
Trade-offs among Soft, Firm, and Hard cores
80
IP Format Representation Optimization Technology Reusability
Hard GDSII Very High Technology Dependent Low
Soft RTL Low Technology Independent Very High
Firm Target Netlist High Technology Generic High
Comparison of Different IP Formats
81
The Design Process of SoCs
n SoC designs are made possible by deep submicron technology. This technology presents a whole set of design challenges including:
n Interconnect delays, n Clock and power distribution, and n Placement and routing of millions of gates.
n These physical design problems can have a significant impact on the functional design of SoCs and on the design process itself.
n The first step in system design is specifying the required functionality.
n The second step is to transform the system funcionality into an architecture which define the system implementation by specifying the number and types of components and connections between them.
82
Define Hardware-Software Codesign
n Hardware-Software Codesign is the concurrent and co-operative design of hardware and software components ofa system.
n The SoC design process is a hardware-software codesign in which design productivity is achived by design reuse.
n The design process is the set of design tasks that transform an abstract specification model into an architectural model.
83
SoC Co-design Flow
84
Rapid design space exploration
Quality tool-kit generation
Design Reuse
Design Specification
HWVerilog, VHDL
SWC++
HW/SW Partitioning
Synthesis Complier
EstimatorArchitecture Description Language
Verification
Co-verification
IP Library
M1
P2
P1
On-chip Memory
Processor core
Synthesized HW
Interface
Off-chip Memory
A canonical or generic form of an SoC design
These chips have:• one (several) processors• large amounts of memory • bus-based architectures • peripherals • coprocessors• and I/O channels
Design Process
85
Top Level Design
Unit Block Design
Integration and SynthesisTrial Netlists
System Level Verification
Timing Convergence& Verification
Fabrication
DVT
DVT Prep
4 14 5 4
Time in WeeksTime to Mask order24
33
Unit Block Verification
4 2
SoC Typical Design Steps
n With increasing Complexity of IC’s and decreasing Geometry, IC Vendor steps of Placement, Layout and Fabrication are unlikely to be greatly reduced.
n In fact there is a greater risk that Timing Convergence steps will involve more iteration.
n Need to reduce time before Vendor Steps.
n Need to consider Layout issues up-front.
86
ø DVT: Design Validation Test
SoC Typical Design Steps
n SoC Architecture already defined. Flexible to scale in frequency and complexity. Allows new IP cores, new technology to be integrated.
n Separate the design of the reusable IP from the design of the SoC. Build the SoC from library of tested IP.
n Unit design consists only of any additional core features or wrapping new IP to enable integration.
n Reusable IP purchased from external sources, developed from in-house designs or designed as separate project off critical SoC development path.
Top Level Design
Unit Block Design
Integration and SynthesisTrial Netlists
System Level Verification
Timing Convergence& Verification
Fabrication
DVT
DVT Prep
4 14 5 4
Time in WeeksTime to Mask order24
33
Unit Block Verification
4 2
87
SoC Methodology
88
SoC Methodology Evolving ...
89
How to Design an SoC
90
How to Design an SoC
91
How to Design an SoC
92
How to Design an SoC
93
How to Design an SoC
94
I/O pads
I/Opa
ds
I/Opads
1149.1 TAP controller
User
-def
ined
logi
c
CPUcore
Self-testcontrol
Legacycore
IP hardcore
DSPcore
Memoryarray
Interfacecontrol
EmbeddedDRAM
Main SoC testing challenges
• Core level test: Embedded cores are tested as a part of the system
• Test access: Due to absence of physical access to the core peripheries, electronic access mechanism required
• SoC level test: SoC test is a single composite test including individual core, and UDL(User-Defined Logic) test and test scheduling
Test data volume for core-based SoC designs is very high.
• New techniques are required to reduce testing time, test cost, and the memory requirements of the automatic test equipment (ATE)
• SoCs are complex designs combining logic, memory and mixed-signal circuits in a single IC
System on Chip - Testing
95
Summary
n An System on Chip (SoC) is an integrated circuit that implements most or all of the function of a complete electronic system.
n Four vital areas of SoC:n Higher levels of abstractionn IP and platform re-usen IP creation – ASIPs, interconnect and algorithmn Earlier software development and integration
96