Upload
ledung
View
235
Download
0
Embed Size (px)
Citation preview
Appro Supercomputer Solutions
Steven Lyness, VP HPC Solutions Engineering
Appro and Tsukuba University Accelerator Cluster Collaboration
Company Overview Appro Celebrates 20 Years of HPC Success….
About Appro Over 20 Years of Experience
Moving Forward….
2007 to 2012
End-To-End
Supercomputer Solutions
1991 – 2000
OEM Server
Manufacturer
2001-2007
Branded Servers
Clusters Solutions
Manufacturer
2
• Over 2 PFLOPs (peak) with just five Top100 systems added in to Top500 in November
• Variety of technologies: −Intel, AMD, NVIDIA
−Multiple server form factors
−Infiniband and GigE
−Fat Tree and 3D Torus
• Excellent Linpack efficiency on non-optimized SB systems
−85.5% Fat Tree
−83% - 85% 3D Torus
Appro on Top 500
3
Appro Milestones Installations in 2012
Site Peak Performance
Los Alamos (LANL) > 1.8 PFLOPs
Sandia (SNL) > 1.2 PFLOPs
Livermore (LLNL) > 1.5 PFLOPs
Japan (Tsukuba, Kyoto) > 1 PFLOPs
• HA-PACS (Highly Accelerated Parallel Advanced system for
Computational Sciences)
Apr. 2011 – Mar. 2014, 3-year project
Project Leader: Prof. M. Sato (Director, CCS, Univ. of Tsukuba)
• Develop next generation GPU system : 15 members
Project Office for Exascale Computing System Development
(Leader: Prof. T. Boku)
GPU cluster based on Tightly Coupled Accelerators architecture
• Develop large scale GPU applications : 15 members
Project Office for Exascale Computational Sciences
(Leader: Prof. M. Umemura)
Elementary Particle Physics, Astrophysics, Bioscience, Nuclear/Quantum
Physics, Global Environmental Science, High Performance Computing
5
About University Of Tsukuba HA-PACS Project
:: Problem Definition
University of Tsukuba- HA-PACS Project
• Many technology discussions to determine KEY :
Fixed budget
High Availability
Latest Processor / High Flops
1:2 CPU:Accelerator Ratio
High Bandwidth to the Accelerator
High bandwidth, low latency interconnect
Apps Could take advantage of “more than QDR IB”
High IO Bandwidth to storage
“Easy to Manage”
6 2012 GTC Conference
Solution Keys
Fixed Budget Considerations
Need to find a balance between:
Performance - Flops, bandwidth (memory, IO
Capacity (CPU Qty, GPU Qty, Memory per core, IO, Storage)
Availability Features
Ease of Management / Supportability
Architecture needed: High Availability
Nodes (PS, Fans)
IPC networks (Ex. InfiniBand)
Service Networks (Provisioning and Management)
7 2012 GTC Conference
What Appro Brings to NWS 8
Challenge: Create a Solution with High Availability
− Redundant power supplies
− Redundant hot swap fan trays
− Redundant Hot swap disk drives
− Redundant Networks
Solution: Appro Xtreme-X™ Supercomputer, flagship product-line using GreenBlade™ sub-rack component used for for the DoE TLCC2 project
Expand to add support for new custom blade nodes
Meeting Key Requirements
:: Appro Xtreme-X™ Supercomputer
Solution Architecture
9 2012 GTC Conference
Unified scalable cluster architecture that can
be provisioned and managed as a stand-alone
supercomputer.
Improved power & cooling efficiency to
dramatically lower total cost of ownership
Offers high performance and high availability
features with lower latency and higher
bandwidth.
Appro HPC Software Stack - Complete HPC
Cluster Software tools combined with the
Appro Cluster Engine™ (ACE) Management
Software including the following capabilities:
System Management
Network Management
Server Management
Cluster Management
Storage Management
10 Presentation Name
Optimal Performance
Meeting Key Requirements
Peak Performance CPU Contribution
Sandy Bridge-EP 2.6 GHz E5-2670 Processor (332 GFlops per node)
GPU Contribution
665 GFlops per NVIDIA S2090
Four (4) S2090’s per node or 2.66 TFlops per node
Combined Peak Performance is 3 TFlops per node
Two Hundred and Sixty-Eight (268) nodes provides 802 TFlops
Accelerator Performance DEDICATED PCI-e Gen3 X16 for each NVIDIA GPU
Uses Gen2 so we have up to 8 GB/s per GPU available
IO Performance 2 x QDR (Mellanox CX3) – Up to 4GB/s per link (on PCI-e Gen3 X8) bus
GigE for Operations networks
Up to 4x 2P GB812X blades
− Expandability for HDD, SSD, GPU, MIC
Six Cooling Fan Units
− Hot swappable & redundant
Up to six 1600W power supplies
− Platinum-rated; 95%+ efficient
− Hot swappable & redundant
Support one or redundant iSCB platform manager modules with
Enhanced management capabilities
− Active & dynamic fan control
− Power monitoring
− Remote power control
− Integrated console server
Appro GreenBlade™ Sub-Rack With Accelerator Expansion Blades
Appro Confidential and Proprietary
Appro GreenBlade™ Subrack
Appro Confidential and Proprietary
• Server Board
−Increased memory footprint (2 DPC)
−Provides access to two (2) PCI-e Gen3 X16 PER SOCKET
• Provides for increased IO capability
−QDR or FDR InfiniBand on the motherboard
−Internal RAID Adapter on Gen3 bus
• Up to two (2) 2.5” Hard drives
NOTE: Can run diskless/stateless because of Appro Cluster Engine but needed local scratch
iSCB Modules
2012 GTC Conference
::
Challenge: Create a server node with
− Latest Generation of processors: Need for flops AND IO capacity
− HIGH bandwidth to the Accelerators
− High Memory capacity
Solution: High Bandwidth Intel Sandy Bridge-EP for CPU and the NVIDIA Tesla for GPU
Working with Intel® EPSD EARLY on to design a motherboard
− Washington Pass (S2600WP) Motherboard with:
Dual Sandy Bridge-EP (E5-2700) sockets
Expose four (4) PCI-e Gen3 X16 for Accelerator Connectivity
Expose one (1) PCI-e Gen3 X8 for Expansion slot/IO
Two (2) DIMMS Per channel (16 DIMMS total)
− 2U form factor for fit and air flow/cooling
13
Server Node Design
Meeting Key Requirements
2012 GTC Conference
4 Channels 1,600 MHz
51.2 GB/sec
SNB-EN
4 Channels 1,600 MHz 51.2 GB/sec
EP
2xQDR IB
EP
Sandy Bridge QPI Patsburg
PCH
PCI-e X4 Gen 3
x 8
Dual GbE
GbE
Dual
BMC
BIOS ESI Sandy Bridge
DD
R3
DD
R3
DD
R3
DD
R3
QPI
Gen 3
x 1
6
DMI
Gen 3
x 8
D
DR3
DD
R3
DD
R3
DD
R3
Gen 3
x 1
6
4 x NVIDIA M2090
Gen 3
x 1
6
Gen 3
x 1
6
Appro Confidential and Proprietary
Intel® EPSD S2600WP Motherboard
Meeting Key Requirements
2012 GTC Conference
GreenBlade Node Design
PAG
E |
15
QDR InfiniBand (Port 0)
GigE – Cluster Management / Operations Network (Prime)
QDR InfiniBand (Port 1)
GigE – Cluster Management / Operations Network (Secondary)
HDD0
HDD1
2012 GTC Conference
:: Network Availability
Meeting Key Requirements
Challenge To provide cost effective redundant networks to eliminate/reduce failures (MTTI)
Solution − Build system with redundant operations Ethernet networks
Redundant on-board GigE each with access to IPMI
Redundant iSCB Modules for baseboard management, node control and monitoring
− Build system with redundant InfiniBand networks
DUAL QDR for price/performance
Selected Mellanox due to Gen3 X8 support (dual port adapter)
16 2012 GTC Conference
:: Operations Networking
Meeting Key Requirements
17
Management
Node(s)
Compute Nodes
GbE
10GigE
10GigE Switch
External Network
Sub Management Node
(GreenBlade™ GB812X)
Login
Node(s)
Rack (1) , Rack (2)
and
Rack (3)
Rack (N-2) , Rack (N-1)
and
Rack (N)
Sub Management Node
(GreenBlade™ GB812X)
48 port Leaf
Switches
Compute Nodes
2012 GTC Conference
:: Ease of Use
Meeting Key Requirements
Challenge • Need the System top install quickly to get into production
• Most have limited “people resources”
• Need to be able to keep the system running and doing science
Solution • Appro HPC Software Stack
− Tested and Validated
− Full stack from HW layer to Application layer
− Allows for quick bring up of a cluster
18 2012 GTC Conference
Appro HPC Software Stack
User Applications
Intel® Cluster
Studio PGI (PGI CDK) GNU PathScale
MVAPICH2 OpenMPI Intel® MPI-(Intel Cluster
Studio)
Appro Cluster Engine (ACE™) Virtual Clusters
Linux (Red Hat, CentOS, SuSE)
ACE ™
OS
Provisioning
Remote Power Mgmt PowerMan
Message Passing
Compilers
Console Mgmt ACE ™ ConMan
Grid Engine Job Scheduling PBS Pro
NFS (3.x Storage Lustre Local FS
(ext3, ext4, XFS)
ACE™ (iSCB and OpenIPMI) Cluster Monitoring
HPCC IOR netperf Performance Monitoring
Appro Xtreme-X™ Supercomputer – Building Blocks
Appro HPC Professional Services - On-site Installation services and/or Customized services
Appro Turn-Key Integration & Delivery Services HW and SW integration, pre-acceptance testing, dismantle, packing and shipping
Appro
HPC S
oft
ware
Sta
ck
Perfctr PAPI/IPM
PanFS
SLURM
2012 GTC Conference
:: Summary
Appro Key Advantages
• Partnering with Key technology partners to offer cutting-edge
integrated solutions:
− Performance
Storage IOR
Networking Bandwidth, latencies and message rates
− Features
High Availability (high standard MTBF, redundancy - PS)
Ease of Management
− Flexibility
− Price /Performance
− Training Programs
Pre-Sales (Sell everything it does and ONLY that)
Installation and Tuning
Post Install Support
20 2012 GTC Conference
::
Appro Corporate Presentation
Turn-Key Solution Summary
Appro Cluster Engine™ (ACE) Management Software Suite
Capability Computing
Hybrid Computing
Capacity Computing
Data Intensive
Computing
Appro Xtreme-X™ Supercomputer addressing 4 HPC Workload Configurations
Appro HPC Software Stack
Turn-Key Integration & Delivery Services
- Node, Rack, Switch, Interconnect, cable, network, storage, software, Burning-in - Pre-acceptance testing, performance validation, dismantle, packing and shipping
Appro HPC Professional Services - On-site Installation services and/or Customized services
Appro Xtreme-X™ Supercomputer
21
Appro Supercomputer Solutions
Questions?
Steve Lyness, VP HPC Solutions Engineering
Ask Now or see us at Table #54
Learn More at www.appro.com
Taisuke Boku
Center for Computational Sciences
University of Tsukuba [email protected]
HA-PACS Next Step for Scientific Frontier
by Accelerated Computing
2012/05/15
23 GTC2012, San Jose
Project plan of HA-PACS
HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences) Accelerating critical problems on various scientific fields in Center for Computational Sciences, University of Tsukuba
− The target application fields will be partially limited
− Current target: QCD, Astro, QM/MM (quantum mechanics / molecular mechanics, for life science)
Two parts − HA-PACS base cluster:
for development of GPU-accelerated code for target fields, and performing product-run of them
− HA-PACS/TCA: (TCA = Tightly Coupled Accelerators)
for elementary research on new technology for accelerated computing
Our original communication system based on PCI-Express named “PEARL”, and a prototype communication chip named “PEACH2”
2012/05/15
24
GTC2012, San Jose
GPU Computing: current trend of HPC
GPU clusters in TOP500 on Nov. 2011 − 2nd 天河Tienha-1A (Rpeak=4.70 PFLOPS)
− 4th 星雲Nebulae (Rpeak=2.98 PFLOPS)
− 5th TSUBAME2.0 (Rpeak=2.29 PFLOPS)
− (1st K Computer Rpeak=11.28 PFLOPS)
Features − high peak performance / cost ratio
− high peak performance / power ratio
− large scale applications with GPU acceleration don’t run yet in production on GPU cluster ⇒ Our First target is to develop large scale applications accelerated by GPU in real computational sciences
25
2012/05/15
GTC2012, San Jose
Problems of GPU Cluster
Problems of GPGPU for HPC − Data I/O performance limitation
Ex) GPGPU: PCIe gen2 x16
Peak Performance: 8GB/s (I/O) ⇔ 665 GFLOPS (NVIDIA M2090)
− Memory size limitation Ex) M2090: 6GByte vs CPU: 4 – 128
GByte
− Communication between accelerators: no direct path (external) ⇒ communication latency via CPU becomes large
Ex) GPGPU: GPU mem ⇒ CPU mem ⇒ (MPI) ⇒ CPU mem ⇒ GPU mem
Researches for direct communication between GPUs are required
26
Our another target is developing a direct communication system between external GPUs for a feasibility study for future accelerated computing
2012/05/15
GTC2012, San Jose
Project Formation
HA-PACS (Highly Accelerated Parallel Advanced system for Computational Sciences)
Apr. 2011 – Mar. 2014, 3-year project
Project Leader: Prof. M. Sato (Director, CCS, Univ. of Tsukuba)
Develop next generation GPU system : 15 members
Project Office for Exascale Computing System Development (Leader: Prof. T. Boku)
GPU cluster based on Tightly Coupled Accelerators architecture
Develop large scale GPU applications : 15 members
Project Office for Exascale Computational Sciences (Leader: Prof. M. Umemura)
Elementary Particle Physics, Astrophysics, Bioscience, Nuclear/Quantum Physics, Global Environmental Science, High Performance Computing
27
2012/05/15
GTC2012, San Jose
HA-PACS base cluster (Feb. 2012)
2012/05/15
GTC2012, San Jose 28
HA-PACS base cluster
2012/05/15
GTC2012, San Jose 29
Front view
Side view
HA-PACS base cluster
2012/05/15
GTC2012, San Jose 30
Rear view of one blade chassis with 4 blades
Front view of 3 blade chassis
Rear view of Infiniband switch and cables (yellow=fibre, black=copper)
HA-PACS: base cluster (computation node)
GTC2012, San Jose 31
(2.6GHz x 8flop/clock)
Total: 3TFLOPS
8GB/s
AVX
665GFLOPSx4 =2660GFLOPS
20.8GFLOPSx16 =332.8GFLOPS
(16GB, 12.8GB/s)x8 =128GB, 102.4GB/s
(6GB, 177GB/s)x4 =24GB, 708GB/s
2012/05/15
Intel Xeon E5 (SandyBridge-EP) x 2
− 8 cores/socket (16 cores/node) with 2.6 GHz
− AVX (256-bit SIMD) on each core ⇒ peak perf./socket = 2.6 x 4 x 2 = 166.4 GFLOPS ⇒ pek perf./node = 332.8 GFLOPS
− Each socket supports up to 40 lanes of PCIe gen3 ⇒ great performance to connect multiple GPUs without I/O performance bottleneck ⇒ current NVIDIA M2090 supports just PCIe gen2, but net generation (Kepler) will support PCIe gen3
− M2090 x4 can be connected to 2 SandyBridge-EP still remaining PCIe gen3 x8 x2 ⇒ Infiniband QDR x 2
2012/05/15
GTC2012, San Jose 32
HA-PACS: base cluster unit(CPU)
HA-PACS: base cluster unit(GPU)
NVIDIA M2090 x 4
− Number of processor core: 512
− Processor core clock: 1.3 GHz
− DP 665 GFLOPS, SP 1331GFLOPS
− PCI Express gen2 ×16 system interface
− Board power dissipation: <= 225 W
− Memory clock: 1.85 GHz, size: 6GB with ECC, 177GB/s
− Shared/L1 Cache: 64KB, L2 Cache: 768KB
GTC2012, San Jose 33
2012/05/15
HA-PACS: base cluster unit(blade node)
GTC2012, San Jose 34
2x 2.6GHz 8core SandyBridge-EP
Air flow
2x NVIDIA Tesla M2090
1x PCIe slot for HCA
2x 2.5”HDD
2x NVIDIA Tesla M2090
Front view Rear view
Power Supply Unit and Fan - 8U enclosure - 4 nodes - 3 PSU(Hot Swappable) - 6 Fans(Hot
Swappable)
2012/05/15
Basic performance data
MPI pingpong
− 6.4 GB/s (N1/2= 8KB)
− with dual rail Infiniband QDR (Mellanox ConnectX-3)
− actually FDR for HCA and QDR for switch
PCIe benchmark (Device -> Host memory copy), aggregated perf. for 4 GPUs simultaneously
− 24 GB/s (N1/2= 20KB)
− PCIe gen2 x16 x4, theoretical peak = 8 GB/s x4 = 32 GB/s
Stream (memory)
− 74.6 GB/s
− theoretical peak = 102.4 GB/s
2012/05/15
GTC2012, San Jose 35
PCIe Host:Device communication performance
2012/05/15
GTC2012, San Jose 36
Slower start on Host->Device compared with Device->Host
HA-PACS Application (1):Elementary Particle Physics
37
Multi-scale physics Finite temperature and density
Investigate hierarchical properties via direct construction of nuclei in lattice QCD GPU to solve large sparse linear systems of equations
Phase analysis of QCD at finite temperature and density GPU to perform matrix-matrix product of dense matrices
quark
proton neutron
nucleus
Expected QCD phase diagram
2012/05/15
GTC2012, San Jose
HA-PACS Applications (2):Astrophysics
38
(A) Collisional N-body Simulation (B) Radiation Transfer
Computations of the accelerations of particles and their time derivatives (jerks) are time consuming.
Direct (brute force) calculations of acceleration and jerks are required to achieve the required numerical accuracy
Globular Clusters
Massive Black Holes in Galaxies
Accelerations and jerks are computed on GPU
• Understanding of the formation of massive black holes in galaxies
• Numerical simulations of complicated gravitational interactions between stars and multiple black holes in galaxy centers.
• Fossil object as a clue to investigate the primordial universe
• Formation of the most primordial objects formed more than 10 giga years.
Calculation of the physical effects of photons emitted by stars and galaxies onto the surrounding matter.
So far, poorly investigated due to its huge amount of computational cost, though it is of critical importance in the formation of stars and galaxies. Computations of the radiation intensity and the resulting chemical reactions based on the ray-tracing methods can be highly accelerated with GPUs owing to its high concurrency.
First Stars and Re-ionization of the Universe
Accretion Disks around Black Holes
• Understanding of the formation of the first stars in the universe and the succeeded re-ionization of the universe.
• Study of the high temperature regions around black holes
2012/05/15
GTC2012, San Jose
HA-PACS Application (3):Bioscience
39
DNA-protein complex
(macroscale MD)
Reaction mechanisms
(QM/MM-MD)
QM region
> 100 atoms
GPU acceleration - Direct coulmb (Gromacs, NAMD, Amber) -2 electron integral
2012/05/15
GTC2012, San Jose
HA-PACS Application (4)
Other advanced researches on HPC Division in CCS
− XcalableMP-dev (XMP-dev) for easy and simple programming language to support distributed memory & GPU accelerated computing for large scale computational sciences
− G8 NuFuSE (Nuclear Fusion Simulation for Exascale) project platform for porting Plasma Simulation Code with GPU technology
− Climate simulation especially for LES (Large Eddy Simulation) for cloud-level resolution on city-model size simulation
− Any other collaboration ...
2012/05/15
GTC2012, San Jose 40
HA-PACS: TCA (Tightly Coupled Accelerator)
TCA: Tightly Coupled Accelerator
− Direct connection between accelerators (GPUs)
− Using PCIe as a communication device between accelerator
Most acceleration device and other I/O device are connected by PCIe as PCIe end-point (slave device)
An intelligent PCIe device logically enables an end-point device to directly communicate with other end-point devices
PEARL: PCI Express Adaptive and Reliable Link
− We already developed such PCIe device (PEACH, PCI Express Adaptive Communication Hub) on JST-CREST project “low power and dependable network for embedded system”
− It enables direct connection between nodes by PCIe Gen2 x4 link
⇒ Improving PEACH for HPC to realize TCA GTC2012, San Jose 41
2012/05/15
PEACH
PEACH: PCI-Express Adaptive Communication Hub
An intelligent PCI-Express communication switch to use PCIe link directly for node-to-node interconnection
Edge of PEACH PCIe link can be connected to any peripheral devices, including GPU
Prototype PEACH chip − 4-port PCI-E gen.2 with x4 lane / port
− PCI-E link edge control feature: “root complex” and “end points” are automatically switched (flipped) according to the connection handling
− Other fault-tolerant (reliability) function is implemented: “flip network link” to allow single link fault
in HA-PACS/TCA prototype development, we will enhance current PEACH chip ⇒ PEACH2
2012/05/15
42
GTC2012, San Jose
HA-PACS/TCA (Tightly Coupled Accelerator)
Enhanced version of PEACH
⇒ PEACH2 − x4 lanes -> x8 lanes
− hardwired on main data path and PCIe interface fabric
PEACH2
CPU
PCIe
CPU
PCIe
Node
PEACH2
PCIe
PCIe
PCIe
GPU
GPU
PCIe
PCIe
Node
PCIe
PCIe
PCIe
GPU
GPU CPU
CPU
IB HCA
IB HCA
IB Switc
h
True GPU-direct
current GPU clusters require 3-hop communication (3-5 times memory copy)
For strong scaling, Inter-GPU direct communication protocol is needed for lower latency and higher throughput
MEM MEM
MEM
MEM MEM MEM
2012/05/15
GTC2012, San Jose 43
Implementation of PEACH2: ASIC⇒FPGA
FPGA based implementation − today’s advanced FPGA allows to use PCIe
hub with multiple ports
− currently gen2 x 8 lanes x 4 ports are available ⇒ soon gen3 will be available (?)
− easy modification and enhancement
− fits to standard (full-size) PCIe board
− internal multi-core general purpose CPU with programmability is available ⇒ easily split hardwired/firmware partitioning on certain level on control layer
Controlling PEACH2 for GPU communication protocol
− collaboration with NVIDIA for information sharing and discussion
− based on CUDA4.0 device to device direct memory copy protocol
2012/05/15
GTC2012, San Jose 44
HA-PACS/TCA Node Cluster = NC
PEACH2 C x 2
PEARL Ring Network
Infiniband Link
Node Cluster with 16 nodes • GPUx64 (G) • CPUx32 (C) • GPU comm with PCIe • IB link / node • CPU: Xeon E5 • GPU: Kepler
Node Cluster
Node Cluster
Node Cluster
Node Cluster
Node Cluster
Infiniband Network
........
4 NC with 16 nodes, or 8 NC with 8 nodes = 360 TFLOPS extension to base cluster
•High speed GPU-GPU comm. by PEACH within NC (PCI-E gen2x8 = 5GB/s/link) •Infiniband QDR (x2) for NC-NC comm. (4GB/s/link)
45
Gx4
PEACH2 C x 2
Gx4
PEACH2 C x 2
Gx4
.....
PEARL/PEACH2 variation (1)
46
C C
C C
C C
C C QPI
PCIe
GPU
GPU
GPU IB
HCA
GPU
PEACH2
PCIe SW
G2 x8
G3
x16
G3
x16
G3 x8
G3 x8
Option 1:
Performance comparison among IB and PEARL can be evenly compared
Additional latency by PCIe switch
2012/05/15
GTC2012, San Jose
PEARL/PEACH2 variation (2)
47
C C
C C
C C
C C QPI
PCIe
GPU
GPU
GPU
IB HCA
GPU
PEACH2
PCIe SW
G2 x8
G3
x16
G3
x16
G3 x16
G3
x16
G3 x8
Option 2:
− Requires only 72 lanes in total
− asymmetric connection among 3 blocks of GPUs
2012/05/15
GTC2012, San Jose
PEACH2 prototype board for TCA
2012/05/15
GTC2012, San Jose 48
FPGA (Altera Stratix IV GX530)
PCIe external link connector x2 (one more on daughter board)
PCIe edge connector (to host server)
daughter board connector
power regulators for FPGA
Summary
HA-PACS consists of two elements: HA-PACS base cluster for application development and HA-PACS/TCA for elementary study for advanced technology on direct communication among accelerating devices (GPUs)
HA-PACS base cluster started its operation from Feb. 2012 with 802 TFLOPS peak performance (Linpack performance will come on June 2012, also expecting good score on Green500)
FPGA implementation of PEACH2 is finished for the prototype version on Mar. 2012 and enhanced for final version in following 6 months
HA-PACS/TCA with at least 300 TFLOPS additional performance will be installed around Mar. 2013
2012/05/15
GTC2012, San Jose 49