Upload
others
View
13
Download
0
Embed Size (px)
Citation preview
PUBLIC USE
SAM SIU
INDEPENDENT CONTRIBUTOR
FTF-NET-N1848
MAY 18, 2016
FTF-NET-N1848
INTRODUCE QorIQ HARDWARE
COMPRESSION ENGINE
PUBLIC USE1 #NXPFTF PUBLIC USE1 #NXPFTF
AGENDA
• Overview of the Storage Market
• Overview of DCE Hardware
• DCE Program Model
• DCE Performance Analysis
PUBLIC USE2 #NXPFTF
Introduction
• QorIQ DPAA has a hardware accelerator decompression and compression engine
(DCE) that can be used for the storage market and bandwidth optimization
applications.
• This session will cover the following topics:
− Overview of the storage market
− Overview of DCE hardware
− DCE program model for kernel and user space applications.
− DCE performance analysis.
PUBLIC USE3 #NXPFTF
OVERVIEW OF THE
STORAGE MARKET
PUBLIC USE4 #NXPFTF
Software Defined
Storage Platform
(SDS)
Storage Controllers
(Arrays)cNAS/
Prosumer NAS
DataCenter
Cold Storage
Ethernet Drive
Cold Storage
I II III IV V
CONSUMER NAS
Storage Applications Overview
PUBLIC USE5 #NXPFTF
Scale of offerings:
Requirements:
2-8 ARM cores, Dual 2.5G/10G, RAID 5/6, SATA, PCIe, Wi-Fi, USB, LRO/TSO offloads, IPSec offload, encryption
8 Drive, dual- 2.5G/10G 5 Drive, dual- 2.5G/10G 4 Drive, dual- 2.5G/10G2 Drive, dual- 2.5G 2 Drive, dual- 2.5G
Consumer/Prosumer NAS
PCIe Gen3
10G
10G
Encryption
RAID
CompressionDeduplication
PCIe Gen3
SAS
Controller
SAS
Controller
PUBLIC USE6 #NXPFTF
Function LS2088A T4240
10G Ethernet 8 x10G + 8 x 1G 4 x 10G + 6 x 1G
NFS/TCP/iSCSI
termination20Gbps 30Gbps
Rolling Hash 15Gbps 25Gbps
SHA-1 Hash 15Gbps 25Gbps
Compression 10Gbps 10Gbps
Encryption 20Gbps 20Gbps
RAID 5/6 Yes Yes
Power 25-40W 41-54W
NXP Storage Solutions on Selected Device
PUBLIC USE7 #NXPFTF
NXP
LS and T4/T2
SSD# 1 SSD# 2 SSD# 1 SSD# 2
PCIe Switch PCIe Switch
PCIe Gen3
x4
25G Ethernet
Interfaces
PCIe Gen3
x4
25G
25G
25G
25G
* Next gen SoC’s
Network Storage Accelerated Solution
• 8/12/24* Core Power & ARM 64b
• Multiple integrated 10G/25G*
interfaces with DCB support
• PCIe Gen3 & Gen4* controllers
• iSER on CPU cores
• Dedupe support
• Compression
• Encryption
• RAID 5/6 or Erasure Coding
PUBLIC USE8 #NXPFTF
Data Path Acceleration Architecture 1 & 2
Performance & Scalability
Parse, Classify,Distribute
Buffer
AIOP (DPAA2)
Qman
/
BMAN
SEC
PME
DCESW Portals
HW Portals
Parse, Classify,Distribute
Buffer
1/10G 1/10G
WRIOP (DPAA2)
1G
1G
1G
1G
1G
1G
Network Acceleration
Saving CPU Cycles for higher value workFMAN
Frame Manager
50Gbps Classify, Parse,
Distribute aggregate
SEC
Security
40Gbps: IPSec, SSL
Public Key 25K/s 1024b RSA
PME
Pattern Matching
10Gbps aggregate
DCE–
Data Compression
20Gbps aggregate
(10G inflate + 10G deflate)
Parse, Classify,Distribute
Buffer
1/10G 1/10G
FMan (DPAA1)
1G
1G
1G
1G
1G
1G
1MB Banked L2
ARM A57
32KB
L1-D
48KB
L1-I
ARM A57
32KB
L1-D
48KB
L1-I
1MB Banked L2
ARM A57
32KB
L1-D
48KB
L1-I
ARM A57
32KB
L1-D
48KB
L1-I1MB Banked L2
ARM A57
32KB
L1-D
48KB
L1-I
ARM A57
32KB
L1-D
48KB
L1-I
1MB Banked L2
ARM A57
32KB
L1-D
48KB
L1-I
ARM A57
32KB
L1-D
48KB
L1-I
2MB Banked L2
Power Architecture
e6500
D-Cache I-Cache
32 KB 32 KB
T
1
T
2 Power Architecture
e6500
D-Cache I-Cache
32 KB 32 KB
T
1
T
2 Power Architecture
e6500
D-Cache I-Cache
32 KB 32 KB
T
1
T
2 Power Architecture
e6500
D-Cache I-Cache
32 KB 32 KB
T
1
T
2
OR
LayerScape (DPAA2)
OorIQ Device (DPAA1)
PUBLIC USE9 #NXPFTF
DCE Enabled Device
• DCE is an optional hardware accelerator for the Data Path Accelerator
Architecture(DPAA) enabled devices.
• DPAA1.x
− T4240, T2080, …
• DPAA2.x
− LS2088A, LS2085A, LS2080A, …
PUBLIC USE10 #NXPFTF
OVERVIEW OF DCE
HARDWARE
PUBLIC USE11 #NXPFTF
DCE Hardware Overview
• Deflate
− As specified as in RFC1951
• GZIP
− As specified in RFC1952
• Zlib
− As specified in RFC1950
− Interoperable with the zlib 1.2.5 compression library
• Encoding
− supports Base 64 encoding and decoding (RFC4648)
• Operate up to 400Mhz
− 10Gbps Compress
− 10Gbps Decompress
− 20Gbps Aggregate
32KB
History
Frame
Agent
QMan
I/F
BMan
I/F
Bus
I/F
Decompressor
Compressor
QMan
Portal
BMan
Portal
System
Bus
4KB
History
PUBLIC USE12 #NXPFTF
DCE Inputs
• Software enqueues work to DCE via Frame Queues (FQ). FQs define the flow for stateful processing.
• FQ initialization creates a location for the DCE to use when storing flow stream context.
• Each work item within the flow is defined by a Frame Descriptor (FD), which includes length, pointer, offsets, and commands.
• DCE has separate channels for compress and decompress.
Command
FQs
Flow
Stream
Context
Context_A
Flow
Stream
Context
Context_A
Direct
Co
nn
ect
Po
rta
l
DCE
WQ6
WQ7
ch
an
ne
l
WQ0
WQ1
WQ2
WQ3
WQ4
WQ5
WQ6
WQ7
FD3
FD2
FD1
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Data
Buffer
Data
Buffer
Data
Buffer
WQ6
WQ7
ch
an
ne
l
WQ0
WQ1
WQ2
WQ3
WQ4
WQ5
WQ6
WQ7
FD3
FD2
FD1
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Data
Buffer
Data
Buffer
Data
Buffer
Decomp
Comp
FQs
0
1
PUBLIC USE13 #NXPFTF
DCE Outputs
• DCE enqueues results to software via Frame Queues (FQ) as defined by FQ Context_B field. When buffers obtained from Bman, buffer pool ID defined by Output FQ.
• Each result is defined by a Frame Descriptor (FD), which includes a Status field.
• DCE updates flow stream context located at Context_A as needed.
• Note: Context_A and context_B are FQ attributes that can store the Flow related data and output information. Refers to the DPAA Reference Manual or SDK APIs for detail.
FD3
FD2
FD1
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Data
Buffer
Data
Buffer
Data
Buffer
Port
al
Decomp
Comp
DCE
Flow
Stream
Context
Context_A
Data
Buffer
Data
Buffer
Data
BufferFD3
FD2
FD1
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Addr
Offset Length
Status/Cmd
PID BPID Addr
Flow
Stream
Context Context_A
Status
FQs
FQs
PUBLIC USE14 #NXPFTF
Compression Capabilities
Compression:
• ZLIB, GZIP and DEFLATE header insertion
• ZLIB and GZIP CRC computation and insertion
• Zlib sync flush and partial flush for chunked compression (for HTTP1.1 for example)
• 4 modes of compression
− No compression (just add DEFLATE header)
− Encode only using static/dynamic Huffman codes
− Compress and encode using static Huffman codes
− Compress and encode using dynamic Huffman codes
• Uses a 4KB sliding history window
• Supports Base 64 encoding (RFC4648) after compression
• Provides at least 2.5:1 compression ratio on the Calgary Corpus
PUBLIC USE15 #NXPFTF
Decompression Capabilities
Decompression supports:
• ZLIB, GZIP and DEFLATE header removal
• ZLIB and GZIP CRC validation
• 32KB history
• Zlib flush for chunked decompression (for HTTP1.1 for example)
• All standard modes of decompression
− No compression
− Static Huffman codes
− Dynamic Huffman codes
• Provides option to return original compressed Frame along with the uncompressed Frame or release the buffers to BMan
• Does not support use of ZLIB preset dictionaries
− zlib data format, FLG (FLaGs) field bit 5, FDICT = 1 is treated as an error.
• Base 64 decoding (RFC4648) prior to decompression
PUBLIC USE16 #NXPFTF
DCE Software (SDK)
• The DCE driver software includes a Linux kernel driver. The driver provides a set of kernel level APIs.
• The driver includes the following functionality.
− DCE Foundation Library interface
The DCE FLIB interface provides a consistent interface to the CCSR registers, the memory defined DMA structures and to the dce_flow software object.
− DCE Configuration interface
The DCE configuration interface is an encapsulation of the DCE CCSR register space and the global/error interrupt source. This is expected to be managed only by (and visible to) a control-plane operating system,
− DCE User-space Interface
There is a debugfs interface available for device debugging. No other userspace interface is available.
− DCE Kernel Driver Interface
The DCE kernel driver APIs provide a callback based interface to the DCE. The driver provides APIs to perform either chunk based (de)compression or stream based (de)compression. The driver internally co-ordinates commands to the DCE and corresponding results from the DCE.
PUBLIC USE17 #NXPFTF
DCE PROGRAM
MODEL
PUBLIC USE18 #NXPFTF
Life of A DCE Packet (DPAA1)
HW channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Pool channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Eth
L2 L3-4
TCP/IP Frame
portal
DCE
portal
CORE
portal
CORE
Eth
FmanParse, Classify,
Distribute
1/10G
1/10G
1G
1G
1G
1G
1G
1G
HiGig DCB
HW
po
rtal
Pool channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
HW channel
WQ
0
WQ
1
WQ
2
WQ
3
WQ
4
WQ
5
WQ
6
WQ
7
Eth (Tx) e.g. WAN optimisation
compressed
Frame
compressed
Frame
Looks up flow information IPFwd
Rx FQ Rx FQFD (compress)
Tx FQ
PUBLIC USE19 #NXPFTF
DCE Driver
• The DCE device is configured via device-tree nodes and by some compile-time
options controlled via Linux’s Kconfig system.
• Refers to the SDK “DCE Kernel Configure Options” section for more info.
− DCE Kernel Configure Options are
Common Kernel Configure
OptionsDescription
CONFIG_STAGING Required in order to make “staging”
drivers such as DCE available.
CONFIG_FSL_DCE Required to build DCE support.
CONFIG_FSL_DCE_CONFIG Compiles in DCE device driver
support.
CONFIG_FSL_DCE_DEBUGFS Compiles in support for debugfs
interface for the DCE.
CONFIG_FSL_DCE_TESTS Compiles DCE test code.
PUBLIC USE20 #NXPFTF
DCE APIs
• For the Linux kernel, the C interface of the DCE drivers provides access to portal-based functionality for arbitrary
higher-layer code, hiding all the mux/demux/locking details required for shared use by multiple driver layers.
• The driver makes 1-to-1 associations between CPUs and DPAA software portals to improve cache locality and
reduce locking requirements.
• The QMan API permits users to work with Frame Queues and callbacks, independently of other users and associated
portal details.
• The BMan API permits users to work with Buffer Pools in a similar manner.
Source Files Description
drivers/staging/fsl_dce/fsl_dce_chunk.h The DCE driver APIs for chunk based (de)compression
drivers/staging/fsl_dce/fsl_dce_stream.h The DCE driver APIs for stream based (de)compression
drivers/staging/fsl_dce/flib/*.* The DCE foundation library (flib) interface
drivers/staging/fsl_dce/flib/dce_regs.h The DCE CCSR register macros. Used in conjunction with bitfield_macros.h macros.
drivers/staging/fsl_dce/flib/dce_defs.h The DCE DMA defined memory structures.
drivers/staging/fsl_dce/flib/dce_flow.h Object which defines the transport mechanism with the DCE engine. This object encompasses the
QMan frame queues required to communicate with the DCE. The chunk and stream object use the
flow object as a base.
PUBLIC USE21 #NXPFTF
DCE Debugging
• The DCE has a debugfs interface for run time debugging.
− Debugfs provides easy access to DCE memory map registers space.
− Refers to the DPAA Reference Manual for the “DCE Individual Register Memory Map”. e.g.
0x000 DCE_CFG — DCE configuration
0x03C DCE_IDLE— DCE Idle status Register
0x3F8 DCE_IP_REV_1 — DCE IP Block Revision 1 register
• Mount debugfs to explore DCE status
mount -t debugfs none /sys/kernel/debug
root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_addr
DCE register offset = 0x0
root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_rw
DCE register offset = 0x0
value = 0x00000003 <-DCE configuration, x03= Enable. Block is operational, Frame Queues are consumed.
root@t4240qds:/dev/shm# echo 0x03c > /sys/kernel/debug/dce/ccsrmem_addr
root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_rw
DCE register offset = 0x3c
value = 0x00000001 <- DCE Idle status Register, 1 = idle
root@t4240qds:/dev/shm# echo 0x3f8 > /sys/kernel/debug/dce/ccsrmem_addr
root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_rw
DCE register offset = 0x3f8
value = 0x0af00101 <-match default value of “0x0AF0_0101”
PUBLIC USE22 #NXPFTF
DCE PERFORMANCE
Analysis
PUBLIC USE23 #NXPFTF
DCE Test Procedure
Refer to drivers/staging/fsl_dce/tests/performance_simple/README for detailed descriptions of sample DCE throughput performance test.
ll performance_simple/
total 1864
-rw-r--r-- 1 r01360 r01360 8 Apr 12 16:09 built-in.o
-rw-r--r-- 1 r01360 r01360 25757 Apr 12 15:49 dce_perf_simple.c
-rw-rw-r-- 1 r01360 r01360 250048 Apr 12 16:10 dce_perf_simple.o
-rw-r--r-- 1 r01360 r01360 30428 Apr 12 15:49 dce_sf_perf_simple.c
-rw-rw-r-- 1 r01360 r01360 261984 Apr 12 16:10 dce_sf_perf_simple.o
-rw-rw-r-- 1 r01360 r01360 316717 Apr 12 16:10 dce_simple_perf_tester.ko
-rw-rw-r-- 1 r01360 r01360 3413 Apr 12 16:10 dce_simple_perf_tester.mod.c
-rw-rw-r-- 1 r01360 r01360 68192 Apr 12 16:10 dce_simple_perf_tester.mod.o
-rw-rw-r-- 1 r01360 r01360 250073 Apr 12 16:10 dce_simple_perf_tester.o
-rw-rw-r-- 1 r01360 r01360 328793 Apr 12 16:10 dce_simple_sf_perf_tester.ko
-rw-rw-r-- 1 r01360 r01360 3547 Apr 12 16:10 dce_simple_sf_perf_tester.mod.c
-rw-rw-r-- 1 r01360 r01360 68328 Apr 12 16:10 dce_simple_sf_perf_tester.mod.o
-rw-rw-r-- 1 r01360 r01360 262002 Apr 12 16:10 dce_simple_sf_perf_tester.o
-rw-r--r-- 4 r01360 r01360 221 Apr 12 15:49 Makefile
-rw-rw-r-- 1 r01360 r01360 0 Apr 12 16:10 modules.builtin
-rw-rw-r-- 1 r01360 r01360 167 Apr 12 16:10 modules.order
-rw-r--r-- 4 r01360 r01360 3355 Apr 12 15:49 README
PUBLIC USE24 #NXPFTF
Kernel Space Performance Test
• Asynchronous
− Stateful test (Uses fsl_dce_stream.h layer):
dce_simple_sf_perf_tester.
− Stateless test (Uses fsl_dce_chunk.h layer):
dce_simple_perf_tester.
• Synchronous (Use zdce.h DCE zlib layer)
− Stateful tests
zpipe and zspeed.
root@t4240qds:# insmod ./dce_perf_simple_test.kotest_mode=0 in_file="InputData1" out_file="InputData1.1_1.gz" comp_effort=1
Loading dce_perf_simple_test module
BMan data block size is 4096
…
DCE thread on cpu 1
Output length is 13249938
root@t4240qds:# rmmod dce_perf_test.ko
DCE Freq = 299999997 hz
CPU Freq: 1666666650
Cycles to complete = 28197315
Time (usec) to complete = 16925
Scaling factor (by 1000) = 1333
Total Input Bytes to Compress: 20000000
Input file size compression: 20000000 bytes
Compression thoughput: 9453 Mbps (12600 Mbps for 400 Mhz DCE)
Decompression thoughput: None
PUBLIC USE25 #NXPFTF
Kernel Space Stateless Performance Test
> modinfo dce_simple_perf_tester.ko
description: DCE loopback example
license: GPL
…
parm: verbose_level:verbosity level: 0 low, 1 is high (default=0) (int)
parm: bman_output:int
parm: test_mode:test_mode: 0 is compression, 1 is decompression (default=0) (int)
parm: b_sg_block_size_code:Size of bman buffers used to create s/g tables (default=4096) (int)
parm: b_sg_block_count:Number of s/g bman buffers to release (default=50) (int)
parm: b_dexp:Bman dexp value, default= (int)
parm: b_dmant:Bman dmant value, default= (int)
parm: block_size:Size of individual input data blocks in s/g (default=4096) (int)
parm: use_local_file:Use the included local header file for (de)compression. The value specifies the input size. Supported value are 0, 2, 4, 8, 12 (default=0) (int)
parm: comp_effort:Compression Effort, default=1 (int)
parm: in_file:Input file to (de)compress (charp)
parm: out_file:Output file result of (de)compression (charp)
parm: comp_ratio:The compresstion ratio to be used for allocated output data buffer (int)
parm: output_size:The extra output size to allocate (int)
parm: bman_data_size:The size of the data buffer pool (int)
PUBLIC USE26 #NXPFTF
Kernel Space Stateful Performance Test
> modinfo dce_simple_sf_perf_tester.ko
description: DCE stateful performance test
license: Dual BSD/GPL
…
parm: verbose_level:verbosity level: 0 low, 1 is high (default=0) (int)
parm: bman_output:int
parm: test_mode:test_mode: 0 is compression, 1 is decompression (default=0) (int)
parm: b_sg_block_size_code:Size of bman buffers used to create s/g tables (default=4096) (int)
parm: b_sg_block_count:Number of s/g bman buffers to release (default=50) (int)
parm: b_dexp:Bman dexp value, default=12 (int)
parm: b_dmant:Bman dmant value, default=1 (int)
parm: block_size:Size of individual input data blocks in s/g (default=4096) (int)
parm: use_local_file:Use the included local header file for (de)compression. The value specifies the input size. Supported value are 0, 2, 4, 8, 12 (default=0) (int)
parm: comp_effort:Compression Effort, default=1 (int)
parm: in_file:Input file to (de)compress (charp)
parm: out_file:Output file result of (de)compression (charp)
parm: comp_ratio:The compresstion ratio to be used for allocated output data buffer (int)
parm: output_size:The extra output size to allocate (int)
parm: bman_data_size:The size of the data buffer pool in bytes (int)
parm: chunking_size:How much input bytes to send at a time (int)
PUBLIC USE27 #NXPFTF
DPAA2: User Space Performance Test
• Asynchronous
− Stateful test (Uses fsl_dce_stream.h layer): dce_simple.
• Synchronous
− Stateful tests(Use zdce.h DCE zlib layer): zlib_test and zpipe.
• insmod fsl-dce-zspeed.ko in_file=/home/root/faketato num_threads=5
zpipe about to start test. The input file is /home/root/faketato. The output file is (null). The number of threads is 5. The number of loops is 1. The work unit size is 131072
Size of file: /home/root/faketato is 24736902
test wait produced 363493937
time according to jiffies = 28
1 loops took 0 secs. CPU Cycles 199441737. CPU frequency is 1799999982
total_total_in = 123684510
Rate is 8930211664 bps <- ~8.9Gbps
Finished test
PUBLIC USE28 #NXPFTF
Demo: T4240 Development System
U-Boot 2016.012.0+g6dfe545 (Mar 30 2016 - 16:49:09 +0530)
CPU0: T4240E, Version: 2.0, (0x82480020)
Core: e6500, Version: 2.0, (0x80400120)
Clock Configuration:
CPU0:1666.667 MHz, CPU1:1666.667 MHz, CPU2:1666.667 MHz, CPU3:1666.667 MHz,
CPU4:1666.667 MHz, CPU5:1666.667 MHz, CPU6:1666.667 MHz, CPU7:1666.667 MHz,
CPU8:1666.667 MHz, CPU9:1666.667 MHz, CPU10:1666.667 MHz, CPU11:1666.667 MHz,
CCB:733.333 MHz,
DDR:933.333 MHz (1866.667 MT/s data rate) (Asynchronous), IFC:183.333 MHz
FMAN1: 733.333 MHz
FMAN2: 733.333 MHz
QMAN: 366.667 MHz
PME: 533.333 MHz
L1: D-cache 32 KiB enabled
I-cache 32 KiB enabled
Reset Configuration Word (RCW):
00000000: 16070019 18101916 00000000 00000000
00000010: 04022828 00558c00 ec020000 f5000000
00000020: 00000000 ee0000ee 00000000 000307fc
00000030: 00000000 00000000 00000000 00000028
I2C: ready
Board: T4240QDS, Sys ID: 0x1e, Sys Ver: 0x22, vBank: 4
FPGA: v3 (T4240QDS_2012_1113_1114), build 438 on Tue Nov 13 17:14:23 2012
SERDES Reference Clocks: SERDES1=125MHz SERDES2=125MHz SERDES3=100MHz SERDES4=100MHz
PUBLIC USE29 #NXPFTF
Gzip vs. DCE: 20x Performance Improvement
• Test Input and output (in ramdisk)
-rw-r--r-- 1 root root 1.1G Apr 13 09:47 dce_webdata1G
-rw-r--r-- 1 root root 177M Apr 13 09:50 dce_webdata1G.gz
• Test Time
>time gzip dce_webdata1G VS >time insmod fsl-dce-zspeed.ko mode=0 num_threads=1 in_file=dce_webdata1G
real 1m58.001s real 0m5.016s
user 1m55.226s user 0m0.001s
sys 0m2.769s sys 0m3.456s
• CPU Utilization (mpstat)
Linux 3.12.37-rt51-01298-g283a93c-dirty (t4240qds) 04/13/16 _ppc64_ (24 CPU)
09:48:18 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
09:48:20 7 51.76 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00 47.24
09:48:22 7 98.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
09:48:24 7 97.50 0.00 2.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Or
Average: all 3.80 0.00 0.09 0.00 0.01 0.00 0.00 0.00 0.00 96.10
Average: 7 91.28 0.00 1.93 0.00 0.07 0.00 0.00 0.00 0.00 6.72
Vs
10:08:46 0 0.00 0.00 32.16 0.00 0.00 0.00 0.00 0.00 0.00 67.84
10:08:48 0 0.00 0.00 99.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00
10:08:50 0 0.00 0.00 41.11 0.00 7.22 0.00 0.00 0.00 0.00 51.67
Or
Average: all 0.01 0.00 1.78 0.00 0.08 0.01 0.00 0.00 0.00 98.12
Average: 0 0.00 0.00 43.26 0.00 1.80 0.00 0.00 0.00 0.00 54.94
PUBLIC USE30 #NXPFTF
Summary
• The Decompression and Compression Engine (DCE) is an accelerator compatible with Data Path Architecture providing lossless data decompression and compression for the QorIQ family of SoCs.
• DCE is a perfect compliment for device that needs to optimize data storage and network bandwidth. DCE is standard compliant. It offers fast execution and free CPU to focus on other applications.
• Software Development Kit provides drivers and APIs for developer to adopt DCE in their application. Reference applications are available for user and kernel space application.
• NXP FTF Sessions:
− FTF-NET-N1878 Storage - Complete Storage Dedupe offload using NXP's Intelligent Storage Accelerator. Next gen SDS platform
• Demo
− QorIQ Leadership Security Functionality
• Further Reference
− QorIQ T4240 Data Path Acceleration Architecture (DPAA) Reference Manual (T4240DPAARM)
− Data Path Acceleration Architecture, Second Generation (DPAA2) Hardware Reference Manual (DPAA2RM)
EXTERNAL USE32
Linux® Services
Integration
Services
Development Tools
Solutions
Reference
Runtime Products
Software Products and Services Visit us in the Tech Lab – #247
Deliver Commercial Software, Support, Services and Solutions
Create Success!
Simplify Software Engagement with NXP
Find us online at www.nxp.com/networking-services
Accelerate Customer Time-to-Market
• Security
Consulting
• Hardened
Linux
• IOT
Gateway
• OpenWRT+
• CodeWarrior• VortiQa Software
Solutions
• Commercial
Support• Performance Tuning
PUBLIC USE33 #NXPFTF
ATTRIBUTION STATEMENT
NXP, the NXP logo, NXP SECURE CONNECTIONS FOR A SMARTER WORLD, CoolFlux, EMBRACE, GREENCHIP, HITAG, I2C BUS, ICODE, JCOP, LIFE VIBES, MIFARE, MIFARE Classic, MIFARE
DESFire, MIFARE Plus, MIFARE FleX, MANTIS, MIFARE ULTRALIGHT, MIFARE4MOBILE, MIGLO, NTAG, ROADLINK, SMARTLX, SMARTMX, STARPLUG, TOPFET, TrenchMOS, UCODE, Freescale,
the Freescale logo, AltiVec, C 5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C Ware, the Energy Efficient Solutions logo, Kinetis, Layerscape, MagniV, mobileGT, PEG, PowerQUICC, Processor Expert,
QorIQ, QorIQ Qonverge, Ready Play, SafeAssure, the SafeAssure logo, StarCore, Symphony, VortiQa, Vybrid, Airfast, BeeKit, BeeStack, CoreNet, Flexis, MXC, Platform in a Package, QUICC Engine,
SMARTMOS, Tower, TurboLink, and UMEMS are trademarks of NXP B.V. All other product or service names are the property of their respective owners. ARM, AMBA, ARM Powered, Artisan, Cortex,
Jazelle, Keil, SecurCore, Thumb, TrustZone, and μVision are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. ARM7, ARM9, ARM11, big.LITTLE, CoreLink,
CoreSight, DesignStart, Mali, mbed, NEON, POP, Sensinode, Socrates, ULINK and Versatile are trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Oracle and
Java are registered trademarks of Oracle and/or its affiliates. The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks
licensed by Power.org. © 2015–2016 NXP B.V.