CAST IPfor ASICs and FPGAs:
Introduction and Overview
August 2014
Introduction and Overview
Who is CAST?
How do we provide “A Better IP Experience”
What products are available?IP CoresIP Platforms
Why Choose CAST?
CAST Intro & Overview2
About CAST
Successful IP provider/developer/partnerTwenty years experience delivering IPPrivately held, financially stableBased in NJ (USA), with international partner network
Unique market approachWe only do IP, designed for reusabilityIndependent of semiconductor technologies and EDA toolsBroad range of IP, featuring:
32- and 8-bit processors/controllers, Video & Image Compression, and the Interfaces, Peripherals, Memory Controllers, and other IP to build complete systems around them
CAST Intro & Overview3
Expert Development Team
CAST Intro & Overview4
All products developed by CAST or tightly-coupled partners
Multimedia PlatformsSerial Communications High-Speed Buses System Integration
Processor PlatformsAMBA Infrastructure IP
Image/Video CompressionMemory ControllersEncryption
Automotive BusInterfaces
Video Codecs& Interfaces
32-bit Processors& Platforms
Graphics & Display Processors
ExtendedPartner Network
IP Development Partners
Semiconductor Providers
Industry Associations & Portals
Mixed-Signal Partners
EDA & Software Partners
Technical Partners
Sales Partners
5 CAST Intro & Overview
Extreme Customer Focus
Global team of ~100 peopleCAST offices in East and West coast USA, Brazil, and EuropeSales/support partners in Middle East and Asia
24/7 culture with very fast responseAlways online with Email, IM, home officesActual IP developers available to help with support
Experience with diverse customers and applications
Pre-sales help in selecting the right IP Post-sales support during system integration
CAST Intro & Overview6
Americas50%
Asia41%
Europe9%
2013 Customers by Region
“A Better IP Experience” ?Proven, high-quality IP products
Broad line from a single, successful provider
Competitive pricing and simple licensing
Ready to use: docs, scripts, testbenches, etc.
Less riskFlexible products
EDA-independent, technology-neutralRTL source or FPGA netlist
Knowledgeable, technical sales team1,900 sales to 800 customersWe know the questions you should be asking
A stable, reliable IP providers with hundreds of design wins
Superior supportBased on 20 years working with IP customers
CAST Intro & Overview7
CAST makes designing with reusable IP a better experience, from your first "make versus buy” considerations through the successful completion of your product.
“We have been working with CAST IP for the last two years for our state-of-the-art 3D camera. We have found their knowledge of system design to be very helpful for completing our project. CAST’s high level of support has been critical to our success.”— Eli Larry
Electronic Group Manager, 3DV Systems (now owned by Microsoft)
Customer Successes
CAST Intro & Overview8
“iSine provides custom ASIC and SoC solutions to multiple market segments. The quality and support of CAST IP cores have saved us valuable time to market with these products. In this highly competitive environment, this advantage is critical to the success of our company.
CAST has repeatedly and quickly helped us out of last-minute jams and multi-vendor IP interface issues.“— Robert Gross
Senior EngineeriSine, Inc.
(see www.isine.com)
GIT Japan Uses CAST 8051 in AIST’s MEMS-EFS Electrostatic Sensor“CAST’s efficient, easy-to-use
8051 core was an excellent solution for our challenge of building AIST’s innovative electrostatic field sensor system.”— Yoshinori Nakagawa, Design Engineer,
GIT Japan
Customer Successes
CAST Intro & Overview9
“The CAST CAN core has been successfully integrated into our new night vision camera product, which has already been shipped to one of our customers. The configuration of the core you have provided fit perfectly with our requirements so we decided to use it as is without any modification. Overall, I think that the core's performance was excellent, while the detailed documentation contributed a lot to the fast integration of the core into our system.”
CAST JPEG IP Core Helps Kapsch TrafficCom Handle Real-Time Image Compression
“The IP-core is well documented and has fulfilled our demands in all aspects of quality, performance, efficient usage of FPGA-resources as well as configurability. “
— Erik Larsson, Product Mgr. for Video and Sensor Solutions, Kapsch TrafficCom, 2010
Processors and ControllersBA2x 32-bit Processor Family:Royalty-free high performance with low power
8051 IP leader for many years
Video and Image CompressionVideo: the highest quality H.264 encoders
Image codecs: the most choices, JPEG to J2K
Graphics ProcessingSingle- and multi-core GPUs
2D/2.5D Graphic Accelerator
Multilayer LCD Display Controller
IP for Complete SoCsPCIe, CAN, SPI, and other Interfaces
Ethernet MAC & IP Stacks
Encryption, Memory Controllers, more
CAST IP Highlights
CAST Intro & Overview10
IP Products
VIDEO & IMAGE COMPRESSION
H.264/AVC: Baseline, Main & High Profile Encoders & Decoder
RTP Stack; Video Over IP Subsystem, Application Platform
JPEG 2000:Encoder, Platform
JPEG: Encoder, Decoder, Codec Scalado Speedtag 12/8-bit Extended Encoder & Decoder
Lossless: LJPEG Encoder, DecoderJPEG-LS Encoder
SECURITY & ENCRYPTION
AES programmable, GCMDES, 3DESMD5SHA-1, SHA-256
GRAPHICS & PERIPHERALS
Graphics Processors:Nema Multicore GPUSingle-Core GPU2D/2.5D Graphics Accelerator
Display Controllers:Multilayer LCD Display
NOR Flash Controllers:Serial & Parallel NOR Flash
Device Controllers:Smart Card Reader
Network Stacks:UDP/IP StackHardware RTP Stack
Data Link Controllers:SDLC & HDLC
AMBA Infrastructure Cores &
AHB 32-bit DMA
Legacy Peripherals:DMA ControllersUARTs, Timer/Counter
CONTROLLERS & PROCESSORS
32-bit BA2x FamilyApplication Processors
Full & BasicEmbedded Processors
Real-TimeDeeply
EmbeddedLow-Power
Dev & Debug Packages
8051 Compatibles: Super-Fast AdvancedFast MatureLegacy-Configurable
11 CAST Intro & Overview
INTERCONNECTS
DisplayPort Transmitter, Receiver
Ethernet MAC
PCI Express X1/X4 & X8controllers, app interface
PCI Targets, Masters, Host Bridges & AHB interfaces
Serial: CAN, CAD FDI2C, LIN, SPI
BA2x 32-bit Processors Competitive
Fast — up to 1.7 DMIPS/MHz,2.44 Coremarks/MHzSmall — low as 15k gatesPower-Saving — low as 0.02mW/MHzEfficient — class-leading code density
Complete Integration — 8051-like peripherals bundlingEcosystem — programming tools, libraries, dev kits
AttractiveProduction-proven and royalty-free
CAST Intro & Overview12
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
ARM7 TDMI Cortex-M0 Cortex-M3 BA22
BA2™ ISA Extreme Code Density
Code Memory consumes many times more resources than any processor
Denser code means much less static & dynamic energy usage
BA2x code is significantly denser than any other
Uses variable length instructions: 16-, 24-, 32-, and 48-bit
32 general purpose registers versus
16 GPRs typical
Optimized compiler
Superior density measures20% better density than Thumb-2(customer evaluation)best CSiBE benchmark resultsCAST Intro & Overview13
Area (Kgates) for Processors & Code Memory
CSiBE GCC Benchmark Code Size Results
2KBytes4KBytes
8KBytes
16KBytes
32KBytes
8bit 16bitBA22-DE BA22-EP
BA22-AP64bit
0
50
100
150
200
250
300
Area
(kG
ates
)
Code MemoryProcessors
Using BA2x – Targeted Versions
BA25 Advanced Application ProcessorARM® Cortex-A7/A8 class
BA22-AP Basic Application Processor Applications using off-chip instruction & data memories; may need to run a full OS
Similar to ARM® Cortex-A5/A7, ARM 9/11 class
BA22-RT Real-Time Embedded ProcessorApplications using off-chip instruction & data memories that may need an RTOS
Similar to ARM® Cortex-R series class
BA22-DE Deeply-Embedded SystemsApplications using on-chip instruction & data memories
Similar to ARM® Cortex-M0/M3 class
BA21 Low-Power Deeply-Embedded ProcessorSimilar to ARM® Cortex-M0 series class
Complete GNU Tool Chain for Windows or Linux
Cycle-Accurate Instruction Set Simulator (ISS)
JTAG Debugging
Ported C libraries and OSs
Software Build ToolsBeyondStudio for Eclipse IDE: ISS, JTAG DebuggingSupport by Lauterbach TRACE32
Development Board KitsTalos-1 for BA22-DERaptor for BA22-RT & BA22-AP
BA2x System Development
CAST Intro & Overview15
Customer-Proven
OptoMotive GigE Vision v2.0 Camera2048x1088, 340 fps, PoE and more
BA22 Processor CoreRuns Linux OS providing tftp, web,...GigE Vision v2.0 software stackInitialization and control of camera functions
” We went from licensing the BA22 processor [to] tape-out in just five months. … Despite our stringent requirements, integration and software development was straightforward.”— Ram Rangarajan,
Vice President, ImaginginSilica
8051 CoresBuilt on 15 years of 8051 experience
Fourth-generation code, proven products
Hundreds of core sales, scores of different applications & customers
Shipped in hundreds of millions of customer units
Solutions for quicker development and verification
Seamless interface to Keil, and IAR 8051 software dev tools
Embedded software debugging package
Royalty-free licensing
17 CAST Intro & Overview
“CAST’s 8051 controller offered the best combination of features, performance, and terms that we could find.
The proven track record of both the 8051 core and the support team at CAST give us great confidence as we pursue this next great engineering challenge.”
— Emad Afifi, VP Engineering Ensphere Solutions, Inc.March 2013
8051 Cores
18 CAST Intro & Overview
Up to 26.85x DMIPS/MHz
of original 80c51 (12.1x for 8-bit)
Up to 800 MHz
Power usage from 2.3 µW/MHz
CPU area from 6.5k gatesS8051XC3
0.252 DMIPS/MHz - 26.85x faster than originalAdvanced power management support, including DFS
R8051XC2 12.1x faster than originalMature code, shipping in millions of devices
L8051XC1 Matches timing and peripherals of older 8051-based MCUs Replaces obsolete parts, and/or enables reuse of your legacy code
8051 Cores
CAST Intro & Overview19
Graphics Processing Cores
CAST Intro & Overview20
Think2.5D
LCD-CTRLThinkLCD-ML
NemaThinkVG
Display Controllers
Graphics Accelerator
Graphics Processing Units
Display Interfaces DisplayPort
DirectFB, Android HWC
OpenVGOpenCLOpenGL|ES
Text and Bitmaps2D-Graphics
Rich 3D-Graphics3D-Graphics
Display Controller CoresLCD-CTRL ThinkLCD-ML
Max Resolution 1024x768 32kx32k
Pixel Formats RGB 24/16-bit RGB 24/16-bit, RGBX/XRGB 32/16-bit, YUYV, Greyscale, RGB232
Color Space Conversion No YesColor Palette Yes Yes (per layer)Graphics Layers 1 4Alpha Blending (Overlays) No YesScaling No YesCursor Support No Fixed of ProgrammableDithering, Gamma Correction, Brightness, Contrast No Yes
SoC- Bus Interface AHB 32-bit AHB 32-bit or AXI 32/64-bit Pixel DMA Yes YesLinux Frame Buffer Drivers No Yes
Graphics Accelerators and GPUs
Think2.5D ThinkVG Nema
APIDirectFB,
Android HWC,OpenWF
OpenVG 1.1OpenCL (now)OpenGL (soon)OpenVX (2015)
Bliting Yes No No
Blending Yes Yes Yes
Drawing Engine Yes Yes Yes
Texture Mapping Optional Yes Yes
Shading No Yes Yes
Rasterization No Yes Yes
Architecture Acceleration (Custom) Pipeline
Unified Shader Plus Accelerators(Single Core)
Unified Shader Plus Accelerators(Multi-Core)
ThinkVG: Vector GPU
Smallest & lowest power GPU in the market
Just 160k gates, excluding memories
OpenVG API allows your exploit the power of Scalable Vector Graphics
Compatible with popular formats (flash, PDF, Vector Fonts etc)
Ideal for Mobile Devices requiring mid-range graphics at the lowest possible power (e.g. mobile phones or handhelds)
Think2.5D: Use Cases
Standalone 2D/2.5D Graphics Accelerator
Graphics-enabled, Android, Linux, or OS-less devices such as car infotainment, GPS, PDAs, Game machines, Set-top boxes, TVs and any device with similar level Graphical User Interface
Composition Engine next to a 3D GPUFor devices with advances Graphics requirements, such as tablets, PCs, and smartphones Think2.5D assigned the simpler composition tasks, and let the GPU work on 3D rendering or sleep
Think2.5D: Why Consider?
Ultra Low PowerHigh performance (2-4 pixel per clock) allows operation at lower frequenciesSmall silicon footprint (just 70kGates) Optional frame buffer compression, for reduced memory bandwidthCommand lists for lower CPU overhead
Highly FeaturedAdvanced Blit and Blend, Font Rendering, and Pseudo-3D (via projections)
Extensive Software Support
Linux, Qt, GTK: DirectFB, OpenWFAndroid: Android HWCDeeply Embedded: Bare-metal API
CAST Intro & Overview25
Nema Configurable GPU
One to many Clusters interconnected with proprietary NoC and sharing external memory
Cluster:1-4 Cores sharing caches + Texture Engine (on/off) + Graphics Accelerators (on/off)Core (Unified
Shader):Up to 128 threadsConfigurable number of pipeline stagesConfigurable support of special functions (e.g. Sin/Cos, Square Root)
Nema: Use Cases
3D-GPU for Embedded ApplicationsMinimal Configurations of Nema offer sufficient performance in a uniquely small silicon footprintNema configurable architecture can scale to meet the graphic and other processing needs of any application
GP-GPU in Embedded Systems
Gain flexibility, area and power by running data and computational intensive tasks (e.g. video analytics, image processing, compression) on Nema
Nema: Why Consider?
Configurability: The only GPU that adapts to your requirements
Performance: More GFLOPs/sq.mm
Ultra-Low Power: Less silicon, frame buffer compression, advanced power management
Easy Software Development: Change the hardware, and keep the software untouched!
Ideal for GP-GPU: C/C++ Compiler and OpenCL API allows exploiting Nema power for video and image processing and other computational & data intensive tasks
DisplayPort Cores
Highly Featured Receiver and Transmitter Cores
DisplayPort 1.1a / 1.2a, including MST & HBR2Embedded DisplayPort (eDP) 1.3Primary and Secondary Aux Channel, Optional HDCP, Enhanced 3D Video TransportDeliverables include reference software driver and link policy maker
PHY IntegrationAlready integrated with FPGA and ASIC PHYsDevelopment team available to assist in the selection and integration of PHY
H.264 Video Encoder Cores
No-compromise design for maximum qualityMeasurably superior compressed video qualityEvaluate yourself with simulation models or platform board
Economical, efficient performanceFrom mobile screen QCIF to 4Kx4K framesCompresses full HD video
1080p at 30 fps in low-cost FPGAs, beyond 1080p at 60 fps in ASICs
Easier system integration and application support …30 CAST Intro & Overview
Main Profile Encoder: best quality videoBaseline Profile Encoder: smaller core, still great qualityBaseline, Main, or High Profile Decoder: fast & efficient
H.264 Encoder Cores, cont.
Easier system integration and application support
Streaming-capable interfaces (Avalon-ST, or AMBA options)Stand-alone operation (no processor required once running)Flexible external memory interface
Low-bandwidth, latency-tolerant, independent of memory type
Run-time tunable operationSwitch modes– Constant Bit Rate (CBR) to fit restricted bandwidth– Variable Bit Rate (VBR) to meet specified quality level– Intra-Only for still-image compression competitive with JPEG
2000Deblocking Filter when needed for even better image quality
Trouble-free technology mappingUp to Level 5.1 compliance with both CABAC and CAVLC coders
31 CAST Intro & Overview
0 250 500 750 1000 1250 1500 1750 2000 2250 250032.00
33.00
34.00
35.00
36.00
37.00
38.00
39.00
40.00
41.00
42.00
43.00
44.00
PSNR vs. Rate (CQP Encoding)
JM (MP)H264-MP-EJM (BP)H264-BP-E
Rate (kbits/sec)
PS
NR
(d
B)
CAST H.264 Quality Tracks Standard
32 CAST Intro & Overview
Image Compression Cores Family
JPEG Encoder & Decoder; 8/12-bit extended JPEG Encoder & DecoderScalado SpeedView JPEG EncoderLossy compression
Cameras, image storage, etc.
LJPEG Encoder, DecoderLossless compression
Professional video, medical imaging
JPEG-LS EncoderLossless compression
The best lossless compression algorithm, from HP
JPEG 2000 EncoderLossy and Lossless compression
Military and aerospace systems,surveillance, medical imaging
33 CAST Intro & Overview
JPEG IP Example
CAST Intro & Overview34
“CAST provided a robust JPEG solution that allowed Pixel Velocity to reduce its development time to get to market faster. CAST provided all the necessary tools to validate the solution in software accelerating the development process of Pixel Velocity personnel. The integration of the CAST JPEG core with the Pixel Velocity system was easy.
The CAST personnel was always available to offer support and answer
questions.”
— Josh Patterick:Lead Engineer while atPixel Velocity
The Pixel Video Fusion™ high-def security system has been installed at Chicago O'Hare International Airport and other locations.
JPEG 2000 EncoderFlexible: lossless and lossy compression
Images stored at multiple resolution levels
Ideal for surveillance: Thumbnails for quick scanningHigher-res images retrieved for detailed analysis
Encoder featuresProcessing: Tier-1 and Tier-2 (typically left to the user)
BIIF option for military imaging standard support
Region of Interest (ROI) option for selective quality
System performance trade-off parametersOutput Bit Rate Control
Number of Entropy Encoders
Full-image or tile processing
35 CAST Intro & Overview
Video Over IP Subsystem
CAST Intro & Overview36
Combines H.264 High Profile Encoder with RTP and UDP/IP Hardware Stacks
Ultra-low latency: overall delay under 5ns
Flexible, with options for interfaces, memory, Ethernet or Wi-Fi integration, more
FPGA Reference Design Kits enable rapid development jumpstart
Customization services tailor it to your specific requirements
Video and Image Compression Application Platforms
Quick way to learn or evaluate H.264 or JPEG 2000 before you design
Solid starting point to reduce your system development time
Cooperative effort of CAST’s virtual organization:
CAST USA
CAST Czech Republic
Alma Technologies, Greece
S2C, China
37 CAST Intro & Overview
PCIe Endpoint Controller Core
Critical featuresMultiple device link widths: 1X, 4X, and 8XPCI-SIG certified Low latency, minimal memory requirements, power managementTechnology independent: ASIC- and FPGA-provenFree PCIe simulation model download for pre-sales evaluation
Rigorously verifiedTested with Avery’s proven PCI-Xactor verification IPExtensive PHY and Mother Board interoperability testingWorks with any 16-bit PIPE-compliant PHY
Includes Unique Application Interface (AIF)Bridges Transaction Layer Protocol (TLP) and standard SoC busesRelieves designer from the TLP knowledge necessary with other cores
IP Products
VIDEO & IMAGE COMPRESSION
H.264/AVC: Baseline, Main & High Profile Encoders & Decoder
RTP Stack; Video Over IP Subsystem, Application Platform
JPEG 2000:Encoder, Platform
JPEG: Encoder, Decoder, Codec Scalado Speedtag 12/8-bit Extended Encoder & Decoder
Lossless: LJPEG Encoder, DecoderJPEG-LS Encoder
SECURITY & ENCRYPTION
AES programmable, GCMDES, 3DESMD5SHA-1, SHA-256
GRAPHICS & PERIPHERALS
Graphics Processors:Nema Multicore GPUSingle-Core GPU2D/2.5D Graphics Accelerator
Display Controllers:Multilayer LCD Display
NOR Flash Controllers:Serial & Parallel NOR Flash
Device Controllers:Smart Card Reader
Network Stacks:UDP/IP StackHardware RTP Stack
Data Link Controllers:SDLC & HDLC
AMBA Infrastructure Cores &
AHB 32-bit DMA
Legacy Peripherals:DMA ControllersUARTs, Timer/Counter
CONTROLLERS & PROCESSORS
32-bit BA2x FamilyApplication Processors
Full & BasicEmbedded Processors
Real-TimeDeeply
EmbeddedLow-Power
Dev & Debug Packages
8051 Compatibles: Super-Fast AdvancedFast MatureLegacy-Configurable
39 CAST Intro & Overview
INTERCONNECTS
DisplayPort Transmitter, Receiver
Ethernet MAC
PCI Express X1/X4 & X8controllers, app interface
PCI Targets, Masters, Host Bridges & AHB interfaces
Serial: CAN, CAD FDI2C, LIN, SPI
Conclusion: Why Choose CAST?
Superior Processor, Compression, and Graphics Products
Controllers, Interfaces, and Functions to complete your SoC
Royalty-Free Licensing and 20 Years of IP Experience reduce your risks
Learn more:visit www.cast-inc.com or call +1 201.391.8300
40 CAST Intro & Overview