Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
Machine Learning: An enabling technology for electronics modeling and design optimizationELYSE ROSENBAUM February 1, 2017Electrical & Computer EngineeringUniversity of Illinois at Urbana-Champaign
What is Machine Learning?
• Not distinguished from AI• Intelligent machines
‒With beyond-human capabilities• Hollywood’s vision:
The View from Popular Culture
3
Photo from Wikipedia
Malevolent
Benevolent
• Not distinguished from AI• Intelligent machines
‒With beyond-human capabilities• Overwhelmingly identified with
neural nets‒As far back as 1968
The View from Popular Culture
4
“The relays are not unlike the synapses of the human brain.”
Historical View
• The application of statistical learning theory to construct accurate predictors (f: inputs→outputs) from data
• Today, this is feasible even with large number of inputs (“features”) due to the availability of powerful computing machines‒Optimization solvers‒Parallel programming
My Answer
6
How does statistical learning differfrom “classical” statistics?
Vanilla Statistics
• Model the MOSFET threshold voltage shift following the removal of bias-temperature stress
• Physics-based model adopted with confidence• Parameter values are extracted from measurement data
‒ Regression (“least squares estimation”)
Statistical Learning Theory
9
Figures from J. Zerbe, JSSC, 12/03
• Domain knowledge used to select the model class ‒Rational function: Z(s) = a(s)/b(s)
• A priori, I do not know what order of b(s) is needed to represent the data (# of poles)‒ If I set the order of b(s) high, I risk overfitting (e.g., fitting to measurement noise)
• Goal of SLT is to predict input-output mapping (not merely parameter extraction)‒Cost functions penalize both under and over fitting
• Example:
‒ Least squares fitting + L2 regularization• Regularization + sufficiently high p used to find the Goldilocks
solution
Cost Functions used in Machine Learning
Scikit-learn.org
𝐶𝐶 𝜃𝜃 =1𝑛𝑛�𝑖𝑖=1
𝑛𝑛
ℎ𝜃𝜃 𝑋𝑋(𝑖𝑖) − 𝑌𝑌(𝑖𝑖) 2 + 𝜆𝜆�𝑗𝑗=1
𝑝𝑝
𝜃𝜃𝑗𝑗2
• Linear regression• Logistic regression• Neural networks• Kernel methods• Supervised learning• Unsupervised learning• Principal components analysis• Stochastic gradient descent• Back propagation• And more …• First step: Identify the right set of tools for your application
Machine Learning Analyses and Algorithms
• Design respins have not been eliminated• Many of the observed failures during qualification testing are
the direct result of an insufficient modeling capability‒Sources of such failures include mistuned analog circuits, signal timing
errors, reliability problems, and crosstalk [1]
‒Variability cannot be modeled in a manner that is both accurate and computationally efficient
• Simulation-based design optimization has had only limited success‒Simulation “in-the-design-loop” often too slow and leads to impractical
designs• Proposal: use machine learning algorithms to overcome those
hurdles!
Limits of Modern EDA
[1] Harry Foster, “2012 Wilson Research Group Functional Verification Study,”http://www.mentor.com/products/fv/multimedia/the-2012-wilson-research-group-functional-verification-studyview
• Machine learning can be applied in ways that do not make our lives more pleasant
• However, using ML to extract models needed for electronic design automation will make our professional lives better‒Design optimization ‒Shorter time-to-market
ML + EDA
13
I want to talk to a real person!!!
• Several parameters (e.g., kTIM, #TSV, fan-speed) must be optimized to limit the clock skew ( < 110 ps)• Used ML-based Bayesian Optimization• More generally, seek to find Xopt = 𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑎𝑛𝑛
𝐗𝐗((f(X)). Accurate modeling of f(X) needed only near min
Thermal Design Optimization for 3D-IC
S. J. Park, H, Yu and M. Swaminathan, Submitted to EMC Symposium, 2016
1-D Demo of Bayesian Optimization
15
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
16
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
17
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
18
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
19
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
20
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
21
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
22
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
23
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
24
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
25
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
26
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
27
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
28
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
29
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
1-D Demo of Bayesian Optimization
30
True function: 𝑓𝑓 𝑥𝑥 = 6𝑥𝑥 − 2 2 sin 12𝑥𝑥 − 4 𝑤𝑤ℎ𝑤𝑤𝑎𝑎𝑤𝑤 𝑥𝑥 ∈ [0,1]
Plot generated by P. Franzon(CAEML/NCSU) using python GPyOpt package
Done!
• Fast-to-simulate models of components needed for system design• Black-box or behavioral models replace the physical models• Surrogate modeling methodology developed to support this• “Surrogates are nonlinear regression models fitted with data obtained
from deterministic numerical simulations using optimal sampling” [Osioand Amon, in Res. Eng. Des., Dec. 1996]
Black-box Component Models for System Design
T. Zhu et al. in Surrogate-Based Modeling and Optimization, ed. Koziel and Leifsson, Springer, 2013.
• Generally uses non-parametric regression in which the data determine the model structure‒ In essence, the model is constructed by interpolation using splines,
kernels, kriging …‒No requirement that variance be constant or that the data follow a normal
distribution• Neural network may be used
Surrogate Modeling Procedure
Goal: Identify the optimal set of calibration knobs for this design
Surrogate Model Based Circuit Design
Multi-dimensional response surfaces
Initial Design
Two knobs (Vd, Vcomp)
Three knobs(Vd, Vcomp, Iin)
Fourth knob (Vc) too sensitive to use
7D : 3 responses, 4 “knobs”
Perform:Sensitivity AnalysisSimulation-based samplingModel fitting
Ref: P. Franzon(CAEML / NCSU)
Time-Delay Neural Network Model of Power Amp
34
NN used to learn I-O mapping forbaseband in-phase and quadrature signals
Model validation:Measurement vs. Model for a Class AB PA (IS95 input signal)
T. Liu et al. in IEEE Trans-MTT, Mar. 2004.
• RNN are proven to approximate any system which can be represented by a smooth nonlinear state space model,i.e.,
��̇�𝑥 = 𝑓𝑓 𝑥𝑥,𝑢𝑢𝑦𝑦 = 𝑎𝑎(𝑥𝑥,𝑢𝑢) ;
u is input, y is output, x is state
• Many circuits and devices can be represented by state space models
Transient Modeling Using Recurrent Neural Networks
Outputs y1xy:yt = bout+xtWout
? g
?
?
Input1xi
Winixn
Wrecnxn
Woutnxy
x1xn
Output1xy
Initial states
x01xn
bin1xn
bout1xy
?
Hidden states x1xn :xt = g(bin+utWin+xt-1Wrec)g(.) = activation, e.g., tanh
g
g
Inputs u1xi
Parameter set :bin, Win, Wrec, bout,Wout , x0
RNN model simulates 12x faster than transistor-level model (HSPICE sims)
RNN for Buffer Chip Modeling
Y. Cao et al. in Trans-MTT, June 2006.
• “Mechanistic models can (1) contribute to scientific understanding, (2) provide a basis for extrapolation, (3) provide a representation of the response function that is more parsimonious than the one attainable empirically.” Box, Hunter and Hunter, in Statistics for Experimenters, Wiley, 1978.
• Empirical model is an approximation of the true I-O function ‒Do not use when a physical I-O relation can be formulated ‒Use empirical model when complexity and dimensionality of problem
make the physics-based approach infeasible‒Example: Behavioral model for 16 high-speed input receivers that are
coupled together through the common PDN
Mechanistic vs. Empirical Models
37
• Illustration: a poor use of behavioral / surrogate / empirical modeling
• Learn an RNN model of an RC circuit‒Model is simulated in Spectre
• The RNN approximates the true state-space model with less than 100% accuracy‒And is less computationally efficient
Mechanistic vs. Empirical Models, cont’d
• Illustration: a potentially good use of behavioral / surrogate / empirical modeling
• Model of a LIN driver pin is extracted from pulsed I-V (+ package parasitics from IBIS)
• Transient sims (at right) have poor fidelity‒ Next, authors manually augment model with a non-
linear time-variant capacitor and a parallel RL‒ Fit improved but not validated for other waveforms
• ML provides a more rigorous approach
Mechanistic vs. Empirical Models, cont’d-2
39
S. Bertonnaud et al., in 2012 EOS/ESD Symp. Proc.
F. Escudie et al. in 2016 EOS/ESD Symp. Proc.
• Machine learning can be used to derive generative models as well as discriminative models
• Generative model represents the joint pdf p(X, Y) and can be used to generate new samples (i.e. synthetic data)
• May be applied to represent process/manufacturing variations
Generative Models
• S-parameter model of coupled microstrip lines (S41 shown)• 50 training samples (red)
‒ Line widths and dielectric permittivity vary, 𝜎𝜎 = 0.1 � �̅�𝑥• 500 generated samples (blue)• 475 validation samples (green)
Generative Modeling Example
S. De Ritter et al., in EDAPS 2016.
Complete, machine learning pipeline:1. Data acquisition and storage2. Data selection and filtering3. Cost function specification and selection of model classes4. Model fitting to training data 5. Model evaluation and validationEnd-to-end performance is key!
ML Modules and Pipelines
42
Pipeline for optical character recognition in photos(Prof. A. Ng, Machine Learning, Coursera)
• Our vision:‒ To enable fast, accurate design and verification of microelectronic circuits
and systems by creating machine-learning algorithms to derive models used for electronic design automation
• An NSF I/UCRC‒ Industry/University Cooperative Research Center‒ Three sitesUniversity of Illinois at Urbana-ChampaignGeorgia TechNorth Carolina State University
Center for Advanced Electronics through Machine Learning (CAEML)
• Model-based design becomes faster and more accurate‒Result: designs that are more reliable, lower power, smaller area
• Simulation-based design verification is more accurate‒Products pass qualification testing the first time; shorter time-to-market
• Computationally efficient system-level analysis of manufacturing variations‒Eliminate excessive guard-banding
CAEML Expected Outcomes
• Modular machine learning (theory)• High-speed links• Power delivery• System-level ESD• IP reuse• Design rule checking
• Future: capacitor placement optimization, compact modeling, counterfeit/tamper detection ….
Projects Supported by CAEML (current)