View
223
Download
0
Category
Tags:
Preview:
Citation preview
On-chip Learning Neural Network Hardware
Implementation for Real-time Control
Prof. Dr. Martin BrookeBortecene Terlemez
Current Status
Simulation Two frequency simulation Added noise simulation
Experiments 1 second suppression Long runs
Simulation SetupD
elay
1.5
ms
Del
ay li
ne
error error
UnstableCombustion Model
xu
Software Simulation of Neural Network Chip
uxxb
bxx
2
2 )(2
One Frequency Plant without Control
One Frequency Result
Time (second)
Engi
ne P
ress
ure
NN
Wei
ght
1 4 2 3 4 2 1 3 2 3 4 3 5 1 5 4
f = 400Hzb =
Two Frequency Results
Time (Second)
NN W
eight
Engi
ne P
ress
ure
f = 400Hz 700Hzb =
10 % Added Noise Results
f=400Hz=0.005b=1
Uncontrolled Engine
Neural Network Controlled Engine
Continuously Changing Plant Parameters (1 point/ second)
Continuously Changing Plant Parameters (50 points/ second)
Experimental Setup
Chip ControlSignals
5
Digital Output
1
Analog Input
Chip Output
Chip Input
Analog Output
8
National InstrumentAT-MIO-16E
National InstrumentAT-AO-10
Current to Voltage Conversion
Short run-time
f = 400 Hz
Long run-time
f = 400 Hz
Experimental Conclusions
Suppression of Oscillation in less than few seconds.
Continuous Adaptation.
Issues
Competing technology status General Purpose HW vs Dedicated
HW Controller Initialization
How to find optimum weights? How to set the weights?
Dedicated NN Hardware
Serial Digital [1] Partially Parallel Digital [2] Fully Parallel Digital [3] Fully Parallel Analog [4]
References
[1] Torsten Lehmann, Erik Bruun, and Casper Dietrich, “Mixed Analog/Digital Matrix-Vector Multiplier for Neural Network Synapses.” Analog Integrated Circuits and Signal Processing, 9, pp. 55-63, 1996.
[2] Antonio J. Montalvo, Ronald S. Gyurcsik, and John J. Paulos, “An Analog VLSI Neural Network with On-Chip Perturbation Learning”, IEEE Journal of Solid-State Circuits, Vol. 32, No. 4, April 1997.
[3] S. Neusser and B. Hofflinger, "Parallel Digital Neural Hardware for Controller Design", Mathematics and Computers in Simulation, Vol. 41, Pp. 149-160, 1996.
[4] Maurizio Valle, Daniele D. Caviglia, and Ciacomo M. Bisio, “An Experimental Analog VLSI Neural Network with On-Chip Back-Propagation Learning”, Analog Integrated Circuits and Signal Processing, 9, pp. 231-245, 1996.
Time for One Forward Propagation
Time 1x1 10x10 100x100 1,000x1,000
Serial Digital 40 4000 400,000 40,000,000
Partially Parallel Digital 309 3090 30,900 309,000
Fully Parallel Digital 1770 1770 1770 1770
Fully Parallel Analog 100 100 100 100
(Time: Number of Gate Delays)
Area
Gate Numbers 1x1 10x10 100x100 1,000x1,000
Serial Digital 82,500 82,500 82,500 82,500
Partially Parallel Digital 55,000 550,000 5,500,000 55,000,000
Fully Parallel Digital 140 14,000 1,400,000 140,000,000
Fully Parallel Analog 17 1,700 170,000 17,000,000
Gate Numbers 1x1 10x10 100x100 1,000x1,000
Serial Digital 82,500 82,500 82,500 82,500
Partially Parallel Digital 55,000 550,000 5,500,000 55,000,000
Fully Parallel Digital 140 14,000 1,400,000 140,000,000
Fully Parallel Analog 17 1,700 170,000 17,000,000
Gate Numbers 1x1 10x10 100x100 1,000x1,000
Serial Digital 82,500 82,500 82,500 82,500
Partially Parallel Digital 55,000 550,000 5,500,000 55,000,000
Fully Parallel Digital 140 14,000 1,400,000 140,000,000
Fully Parallel Analog 17 1,700 170,000 17,000,000
(Area: Number of Transistors)
Today’s Technology - 0.35 m CMOS
Time 1x1 10x10 100x100 1,000x1,000
Serial Digital 7.472 747.2 74720 7,472,000
Partially Parallel Digital 57.72 577.2 5772 57720
Fully Parallel Digital 330.63 330.63 330.63 330.63
Fully Parallel Analog 18.68 18.68 18.68 18.68
Area 1x1 10x10 100x100 1,000x1,000
Serial Digital 2.476 2.476 2.476 2.476
Partially Parallel Digital 2.68 26.8 268 2680
Fully Parallel Digital 0.362 36.2 3,620 362,000
Fully Parallel Analog 0.000868 0.0868 8.68 868
Speed (ns)
Chip Area(mm2)
Area and Time Requirement for 0.35-m CMOS Process
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.E-04 1.E-02 1.E+00 1.E+02 1.E+04 1.E+06
Area (mmxmm)
Tim
e (n
s)
Serial DigitalPartially Parallel DigitalFully Parallel DigitalFully Parallel Analog1x110x10100x1001000x1000
Area and Time Estimation for 70-nm CMOS Process
Time 1x1 10x10 100x100 1,000x1,000
Serial Digital 2.596 259.6 25,960 2,596,000
Partially Parallel Digital 20.05 200.5 2,005 20,050
Fully Parallel Digital 114.87 114.87 114.87 114.87
Fully Parallel Analog 6.49 6.49 6.49 6.49
Area 1x1 10x10 100x100 1,000x1,000
Serial Digital 0.099 0.099 0.099 0.099
Partially Parallel Digital 0.1027 1.027 10.27 102.7
Fully Parallel Digital 0.01448 1.448 144.8 14,480
Fully Parallel Analog 0.00003472 0.003472 0.3472 34.72
Speed (ns)
Chip Area(mm2)
Area and Time Requirement for 70-nm CMOS Process
1.00E+00
1.00E+01
1.00E+02
1.00E+03
1.00E+04
1.00E+05
1.00E+06
1.00E+07
1.E-05 1.E-04 1.E-03 1.E-02 1.E-01 1.E+00 1.E+01 1.E+02 1.E+03 1.E+04 1.E+05
Area (mmxmm)
Tim
e (
ns
)
Serial Digital
Partially Parallel Digital
Fully Parallel Digital
Fully Parallel Analog
1x1
10x10
100x100
1000x1000
Controller Initialization
How to find weights Simulation
Is this good enough? Recorded training
Simulation
Problem : Current chips are volatile Solution : FPGA
Time (Second)
NN W
eight
Engi
ne P
ress
ure
Recorded Simulation (current chip)
Error Decreases f = 400Hz = 0.0b = 0.1
•Error Decrease Signal
•Random Sequence
Controller Initialization
How to set weights Recorded simulation/training (current
chips) permanent analog weight Digital weight storage (FPGA, custom)
Permanent Weight Storage
Kahngand Sze
(?)
FG Devicesand Circuits
log
1989 1999
ETANN
Brooke,et.al
Shib ata/ Ohm i
Yang
STL S
AFGA
AdaptiveRet ina
ISD- Voic eRecorder
Digital Non-volitile Memories
1967...... .
EEPROM - FLASH
Past EEPROM NN
Permanent weight version of current chip
Permanent Analog Weight: Floating-Gate MOS
n nn pp
(n-well)
Regular CMOS
Floating Gate MOS
RWC module with Floating-Gate MOS
Digital Weight Storage
Custom digital chips Field Programmable Gate Arrays (FPGA)
Custom digital chips
13 bit programmable DAC 6-8 bits probably enough Expensive/slow to develop
Field Programmable Gate Arrays (FPGA)
Reconfigurable Flexible Low-cost design cycle
1992: First ANN on FPGA 30 of XC3090 (8000 gates each) used Each neuron with 14 synapses:2 FPGA + 1 EPROM
Today: very high density FPGAs with partial dynamic reconfiguration made possible ( >3 million gates)
RRANN
Run-time Reconfigurable Artificial Neural Networks (RRANN)
Time sharing the limited computing resource.
Conclusion
FPGA technology ready Faster development Plan to adapt current test setups
Plan to attempt weight initialization Recorded simulation/ training (current
chips) Digital weights (FPGA)
Recommended