34
PUBLIC USE SAM SIU INDEPENDENT CONTRIBUTOR FTF-NET-N1848 MAY 18, 2016 FTF-NET-N1848 INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

  • Upload
    others

  • View
    13

  • Download
    0

Embed Size (px)

Citation preview

Page 1: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE

SAM SIU

INDEPENDENT CONTRIBUTOR

FTF-NET-N1848

MAY 18, 2016

FTF-NET-N1848

INTRODUCE QorIQ HARDWARE

COMPRESSION ENGINE

Page 2: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE1 #NXPFTF PUBLIC USE1 #NXPFTF

AGENDA

• Overview of the Storage Market

• Overview of DCE Hardware

• DCE Program Model

• DCE Performance Analysis

Page 3: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE2 #NXPFTF

Introduction

• QorIQ DPAA has a hardware accelerator decompression and compression engine

(DCE) that can be used for the storage market and bandwidth optimization

applications.

• This session will cover the following topics:

− Overview of the storage market

− Overview of DCE hardware

− DCE program model for kernel and user space applications.

− DCE performance analysis.

Page 4: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE3 #NXPFTF

OVERVIEW OF THE

STORAGE MARKET

Page 5: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE4 #NXPFTF

Software Defined

Storage Platform

(SDS)

Storage Controllers

(Arrays)cNAS/

Prosumer NAS

DataCenter

Cold Storage

Ethernet Drive

Cold Storage

I II III IV V

CONSUMER NAS

Storage Applications Overview

Page 6: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE5 #NXPFTF

Scale of offerings:

Requirements:

2-8 ARM cores, Dual 2.5G/10G, RAID 5/6, SATA, PCIe, Wi-Fi, USB, LRO/TSO offloads, IPSec offload, encryption

8 Drive, dual- 2.5G/10G 5 Drive, dual- 2.5G/10G 4 Drive, dual- 2.5G/10G2 Drive, dual- 2.5G 2 Drive, dual- 2.5G

Consumer/Prosumer NAS

PCIe Gen3

10G

10G

Encryption

RAID

CompressionDeduplication

PCIe Gen3

SAS

Controller

SAS

Controller

Page 7: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE6 #NXPFTF

Function LS2088A T4240

10G Ethernet 8 x10G + 8 x 1G 4 x 10G + 6 x 1G

NFS/TCP/iSCSI

termination20Gbps 30Gbps

Rolling Hash 15Gbps 25Gbps

SHA-1 Hash 15Gbps 25Gbps

Compression 10Gbps 10Gbps

Encryption 20Gbps 20Gbps

RAID 5/6 Yes Yes

Power 25-40W 41-54W

NXP Storage Solutions on Selected Device

Page 8: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE7 #NXPFTF

NXP

LS and T4/T2

SSD# 1 SSD# 2 SSD# 1 SSD# 2

PCIe Switch PCIe Switch

PCIe Gen3

x4

25G Ethernet

Interfaces

PCIe Gen3

x4

25G

25G

25G

25G

* Next gen SoC’s

Network Storage Accelerated Solution

• 8/12/24* Core Power & ARM 64b

• Multiple integrated 10G/25G*

interfaces with DCB support

• PCIe Gen3 & Gen4* controllers

• iSER on CPU cores

• Dedupe support

• Compression

• Encryption

• RAID 5/6 or Erasure Coding

Page 9: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE8 #NXPFTF

Data Path Acceleration Architecture 1 & 2

Performance & Scalability

Parse, Classify,Distribute

Buffer

AIOP (DPAA2)

Qman

/

BMAN

SEC

PME

DCESW Portals

HW Portals

Parse, Classify,Distribute

Buffer

1/10G 1/10G

WRIOP (DPAA2)

1G

1G

1G

1G

1G

1G

Network Acceleration

Saving CPU Cycles for higher value workFMAN

Frame Manager

50Gbps Classify, Parse,

Distribute aggregate

SEC

Security

40Gbps: IPSec, SSL

Public Key 25K/s 1024b RSA

PME

Pattern Matching

10Gbps aggregate

DCE–

Data Compression

20Gbps aggregate

(10G inflate + 10G deflate)

Parse, Classify,Distribute

Buffer

1/10G 1/10G

FMan (DPAA1)

1G

1G

1G

1G

1G

1G

1MB Banked L2

ARM A57

32KB

L1-D

48KB

L1-I

ARM A57

32KB

L1-D

48KB

L1-I

1MB Banked L2

ARM A57

32KB

L1-D

48KB

L1-I

ARM A57

32KB

L1-D

48KB

L1-I1MB Banked L2

ARM A57

32KB

L1-D

48KB

L1-I

ARM A57

32KB

L1-D

48KB

L1-I

1MB Banked L2

ARM A57

32KB

L1-D

48KB

L1-I

ARM A57

32KB

L1-D

48KB

L1-I

2MB Banked L2

Power Architecture

e6500

D-Cache I-Cache

32 KB 32 KB

T

1

T

2 Power Architecture

e6500

D-Cache I-Cache

32 KB 32 KB

T

1

T

2 Power Architecture

e6500

D-Cache I-Cache

32 KB 32 KB

T

1

T

2 Power Architecture

e6500

D-Cache I-Cache

32 KB 32 KB

T

1

T

2

OR

LayerScape (DPAA2)

OorIQ Device (DPAA1)

Page 10: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE9 #NXPFTF

DCE Enabled Device

• DCE is an optional hardware accelerator for the Data Path Accelerator

Architecture(DPAA) enabled devices.

• DPAA1.x

− T4240, T2080, …

• DPAA2.x

− LS2088A, LS2085A, LS2080A, …

Page 11: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE10 #NXPFTF

OVERVIEW OF DCE

HARDWARE

Page 12: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE11 #NXPFTF

DCE Hardware Overview

• Deflate

− As specified as in RFC1951

• GZIP

− As specified in RFC1952

• Zlib

− As specified in RFC1950

− Interoperable with the zlib 1.2.5 compression library

• Encoding

− supports Base 64 encoding and decoding (RFC4648)

• Operate up to 400Mhz

− 10Gbps Compress

− 10Gbps Decompress

− 20Gbps Aggregate

32KB

History

Frame

Agent

QMan

I/F

BMan

I/F

Bus

I/F

Decompressor

Compressor

QMan

Portal

BMan

Portal

System

Bus

4KB

History

Page 13: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE12 #NXPFTF

DCE Inputs

• Software enqueues work to DCE via Frame Queues (FQ). FQs define the flow for stateful processing.

• FQ initialization creates a location for the DCE to use when storing flow stream context.

• Each work item within the flow is defined by a Frame Descriptor (FD), which includes length, pointer, offsets, and commands.

• DCE has separate channels for compress and decompress.

Command

FQs

Flow

Stream

Context

Context_A

Flow

Stream

Context

Context_A

Direct

Co

nn

ect

Po

rta

l

DCE

WQ6

WQ7

ch

an

ne

l

WQ0

WQ1

WQ2

WQ3

WQ4

WQ5

WQ6

WQ7

FD3

FD2

FD1

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Data

Buffer

Data

Buffer

Data

Buffer

WQ6

WQ7

ch

an

ne

l

WQ0

WQ1

WQ2

WQ3

WQ4

WQ5

WQ6

WQ7

FD3

FD2

FD1

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Data

Buffer

Data

Buffer

Data

Buffer

Decomp

Comp

FQs

0

1

Page 14: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE13 #NXPFTF

DCE Outputs

• DCE enqueues results to software via Frame Queues (FQ) as defined by FQ Context_B field. When buffers obtained from Bman, buffer pool ID defined by Output FQ.

• Each result is defined by a Frame Descriptor (FD), which includes a Status field.

• DCE updates flow stream context located at Context_A as needed.

• Note: Context_A and context_B are FQ attributes that can store the Flow related data and output information. Refers to the DPAA Reference Manual or SDK APIs for detail.

FD3

FD2

FD1

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Data

Buffer

Data

Buffer

Data

Buffer

Port

al

Decomp

Comp

DCE

Flow

Stream

Context

Context_A

Data

Buffer

Data

Buffer

Data

BufferFD3

FD2

FD1

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Addr

Offset Length

Status/Cmd

PID BPID Addr

Flow

Stream

Context Context_A

Status

FQs

FQs

Page 15: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE14 #NXPFTF

Compression Capabilities

Compression:

• ZLIB, GZIP and DEFLATE header insertion

• ZLIB and GZIP CRC computation and insertion

• Zlib sync flush and partial flush for chunked compression (for HTTP1.1 for example)

• 4 modes of compression

− No compression (just add DEFLATE header)

− Encode only using static/dynamic Huffman codes

− Compress and encode using static Huffman codes

− Compress and encode using dynamic Huffman codes

• Uses a 4KB sliding history window

• Supports Base 64 encoding (RFC4648) after compression

• Provides at least 2.5:1 compression ratio on the Calgary Corpus

Page 16: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE15 #NXPFTF

Decompression Capabilities

Decompression supports:

• ZLIB, GZIP and DEFLATE header removal

• ZLIB and GZIP CRC validation

• 32KB history

• Zlib flush for chunked decompression (for HTTP1.1 for example)

• All standard modes of decompression

− No compression

− Static Huffman codes

− Dynamic Huffman codes

• Provides option to return original compressed Frame along with the uncompressed Frame or release the buffers to BMan

• Does not support use of ZLIB preset dictionaries

− zlib data format, FLG (FLaGs) field bit 5, FDICT = 1 is treated as an error.

• Base 64 decoding (RFC4648) prior to decompression

Page 17: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE16 #NXPFTF

DCE Software (SDK)

• The DCE driver software includes a Linux kernel driver. The driver provides a set of kernel level APIs.

• The driver includes the following functionality.

− DCE Foundation Library interface

The DCE FLIB interface provides a consistent interface to the CCSR registers, the memory defined DMA structures and to the dce_flow software object.

− DCE Configuration interface

The DCE configuration interface is an encapsulation of the DCE CCSR register space and the global/error interrupt source. This is expected to be managed only by (and visible to) a control-plane operating system,

− DCE User-space Interface

There is a debugfs interface available for device debugging. No other userspace interface is available.

− DCE Kernel Driver Interface

The DCE kernel driver APIs provide a callback based interface to the DCE. The driver provides APIs to perform either chunk based (de)compression or stream based (de)compression. The driver internally co-ordinates commands to the DCE and corresponding results from the DCE.

Page 18: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE17 #NXPFTF

DCE PROGRAM

MODEL

Page 19: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE18 #NXPFTF

Life of A DCE Packet (DPAA1)

HW channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Pool channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Eth

L2 L3-4

TCP/IP Frame

portal

DCE

portal

CORE

portal

CORE

Eth

FmanParse, Classify,

Distribute

1/10G

1/10G

1G

1G

1G

1G

1G

1G

HiGig DCB

HW

po

rtal

Pool channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

HW channel

WQ

0

WQ

1

WQ

2

WQ

3

WQ

4

WQ

5

WQ

6

WQ

7

Eth (Tx) e.g. WAN optimisation

compressed

Frame

compressed

Frame

Looks up flow information IPFwd

Rx FQ Rx FQFD (compress)

Tx FQ

Page 20: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE19 #NXPFTF

DCE Driver

• The DCE device is configured via device-tree nodes and by some compile-time

options controlled via Linux’s Kconfig system.

• Refers to the SDK “DCE Kernel Configure Options” section for more info.

− DCE Kernel Configure Options are

Common Kernel Configure

OptionsDescription

CONFIG_STAGING Required in order to make “staging”

drivers such as DCE available.

CONFIG_FSL_DCE Required to build DCE support.

CONFIG_FSL_DCE_CONFIG Compiles in DCE device driver

support.

CONFIG_FSL_DCE_DEBUGFS Compiles in support for debugfs

interface for the DCE.

CONFIG_FSL_DCE_TESTS Compiles DCE test code.

Page 21: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE20 #NXPFTF

DCE APIs

• For the Linux kernel, the C interface of the DCE drivers provides access to portal-based functionality for arbitrary

higher-layer code, hiding all the mux/demux/locking details required for shared use by multiple driver layers.

• The driver makes 1-to-1 associations between CPUs and DPAA software portals to improve cache locality and

reduce locking requirements.

• The QMan API permits users to work with Frame Queues and callbacks, independently of other users and associated

portal details.

• The BMan API permits users to work with Buffer Pools in a similar manner.

Source Files Description

drivers/staging/fsl_dce/fsl_dce_chunk.h The DCE driver APIs for chunk based (de)compression

drivers/staging/fsl_dce/fsl_dce_stream.h The DCE driver APIs for stream based (de)compression

drivers/staging/fsl_dce/flib/*.* The DCE foundation library (flib) interface

drivers/staging/fsl_dce/flib/dce_regs.h The DCE CCSR register macros. Used in conjunction with bitfield_macros.h macros.

drivers/staging/fsl_dce/flib/dce_defs.h The DCE DMA defined memory structures.

drivers/staging/fsl_dce/flib/dce_flow.h Object which defines the transport mechanism with the DCE engine. This object encompasses the

QMan frame queues required to communicate with the DCE. The chunk and stream object use the

flow object as a base.

Page 22: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE21 #NXPFTF

DCE Debugging

• The DCE has a debugfs interface for run time debugging.

− Debugfs provides easy access to DCE memory map registers space.

− Refers to the DPAA Reference Manual for the “DCE Individual Register Memory Map”. e.g.

0x000 DCE_CFG — DCE configuration

0x03C DCE_IDLE— DCE Idle status Register

0x3F8 DCE_IP_REV_1 — DCE IP Block Revision 1 register

• Mount debugfs to explore DCE status

mount -t debugfs none /sys/kernel/debug

root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_addr

DCE register offset = 0x0

root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_rw

DCE register offset = 0x0

value = 0x00000003 <-DCE configuration, x03= Enable. Block is operational, Frame Queues are consumed.

root@t4240qds:/dev/shm# echo 0x03c > /sys/kernel/debug/dce/ccsrmem_addr

root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_rw

DCE register offset = 0x3c

value = 0x00000001 <- DCE Idle status Register, 1 = idle

root@t4240qds:/dev/shm# echo 0x3f8 > /sys/kernel/debug/dce/ccsrmem_addr

root@t4240qds:/dev/shm# cat /sys/kernel/debug/dce/ccsrmem_rw

DCE register offset = 0x3f8

value = 0x0af00101 <-match default value of “0x0AF0_0101”

Page 23: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE22 #NXPFTF

DCE PERFORMANCE

Analysis

Page 24: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE23 #NXPFTF

DCE Test Procedure

Refer to drivers/staging/fsl_dce/tests/performance_simple/README for detailed descriptions of sample DCE throughput performance test.

ll performance_simple/

total 1864

-rw-r--r-- 1 r01360 r01360 8 Apr 12 16:09 built-in.o

-rw-r--r-- 1 r01360 r01360 25757 Apr 12 15:49 dce_perf_simple.c

-rw-rw-r-- 1 r01360 r01360 250048 Apr 12 16:10 dce_perf_simple.o

-rw-r--r-- 1 r01360 r01360 30428 Apr 12 15:49 dce_sf_perf_simple.c

-rw-rw-r-- 1 r01360 r01360 261984 Apr 12 16:10 dce_sf_perf_simple.o

-rw-rw-r-- 1 r01360 r01360 316717 Apr 12 16:10 dce_simple_perf_tester.ko

-rw-rw-r-- 1 r01360 r01360 3413 Apr 12 16:10 dce_simple_perf_tester.mod.c

-rw-rw-r-- 1 r01360 r01360 68192 Apr 12 16:10 dce_simple_perf_tester.mod.o

-rw-rw-r-- 1 r01360 r01360 250073 Apr 12 16:10 dce_simple_perf_tester.o

-rw-rw-r-- 1 r01360 r01360 328793 Apr 12 16:10 dce_simple_sf_perf_tester.ko

-rw-rw-r-- 1 r01360 r01360 3547 Apr 12 16:10 dce_simple_sf_perf_tester.mod.c

-rw-rw-r-- 1 r01360 r01360 68328 Apr 12 16:10 dce_simple_sf_perf_tester.mod.o

-rw-rw-r-- 1 r01360 r01360 262002 Apr 12 16:10 dce_simple_sf_perf_tester.o

-rw-r--r-- 4 r01360 r01360 221 Apr 12 15:49 Makefile

-rw-rw-r-- 1 r01360 r01360 0 Apr 12 16:10 modules.builtin

-rw-rw-r-- 1 r01360 r01360 167 Apr 12 16:10 modules.order

-rw-r--r-- 4 r01360 r01360 3355 Apr 12 15:49 README

Page 25: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE24 #NXPFTF

Kernel Space Performance Test

• Asynchronous

− Stateful test (Uses fsl_dce_stream.h layer):

dce_simple_sf_perf_tester.

− Stateless test (Uses fsl_dce_chunk.h layer):

dce_simple_perf_tester.

• Synchronous (Use zdce.h DCE zlib layer)

− Stateful tests

zpipe and zspeed.

root@t4240qds:# insmod ./dce_perf_simple_test.kotest_mode=0 in_file="InputData1" out_file="InputData1.1_1.gz" comp_effort=1

Loading dce_perf_simple_test module

BMan data block size is 4096

DCE thread on cpu 1

Output length is 13249938

root@t4240qds:# rmmod dce_perf_test.ko

DCE Freq = 299999997 hz

CPU Freq: 1666666650

Cycles to complete = 28197315

Time (usec) to complete = 16925

Scaling factor (by 1000) = 1333

Total Input Bytes to Compress: 20000000

Input file size compression: 20000000 bytes

Compression thoughput: 9453 Mbps (12600 Mbps for 400 Mhz DCE)

Decompression thoughput: None

Page 26: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE25 #NXPFTF

Kernel Space Stateless Performance Test

> modinfo dce_simple_perf_tester.ko

description: DCE loopback example

license: GPL

parm: verbose_level:verbosity level: 0 low, 1 is high (default=0) (int)

parm: bman_output:int

parm: test_mode:test_mode: 0 is compression, 1 is decompression (default=0) (int)

parm: b_sg_block_size_code:Size of bman buffers used to create s/g tables (default=4096) (int)

parm: b_sg_block_count:Number of s/g bman buffers to release (default=50) (int)

parm: b_dexp:Bman dexp value, default= (int)

parm: b_dmant:Bman dmant value, default= (int)

parm: block_size:Size of individual input data blocks in s/g (default=4096) (int)

parm: use_local_file:Use the included local header file for (de)compression. The value specifies the input size. Supported value are 0, 2, 4, 8, 12 (default=0) (int)

parm: comp_effort:Compression Effort, default=1 (int)

parm: in_file:Input file to (de)compress (charp)

parm: out_file:Output file result of (de)compression (charp)

parm: comp_ratio:The compresstion ratio to be used for allocated output data buffer (int)

parm: output_size:The extra output size to allocate (int)

parm: bman_data_size:The size of the data buffer pool (int)

Page 27: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE26 #NXPFTF

Kernel Space Stateful Performance Test

> modinfo dce_simple_sf_perf_tester.ko

description: DCE stateful performance test

license: Dual BSD/GPL

parm: verbose_level:verbosity level: 0 low, 1 is high (default=0) (int)

parm: bman_output:int

parm: test_mode:test_mode: 0 is compression, 1 is decompression (default=0) (int)

parm: b_sg_block_size_code:Size of bman buffers used to create s/g tables (default=4096) (int)

parm: b_sg_block_count:Number of s/g bman buffers to release (default=50) (int)

parm: b_dexp:Bman dexp value, default=12 (int)

parm: b_dmant:Bman dmant value, default=1 (int)

parm: block_size:Size of individual input data blocks in s/g (default=4096) (int)

parm: use_local_file:Use the included local header file for (de)compression. The value specifies the input size. Supported value are 0, 2, 4, 8, 12 (default=0) (int)

parm: comp_effort:Compression Effort, default=1 (int)

parm: in_file:Input file to (de)compress (charp)

parm: out_file:Output file result of (de)compression (charp)

parm: comp_ratio:The compresstion ratio to be used for allocated output data buffer (int)

parm: output_size:The extra output size to allocate (int)

parm: bman_data_size:The size of the data buffer pool in bytes (int)

parm: chunking_size:How much input bytes to send at a time (int)

Page 28: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE27 #NXPFTF

DPAA2: User Space Performance Test

• Asynchronous

− Stateful test (Uses fsl_dce_stream.h layer): dce_simple.

• Synchronous

− Stateful tests(Use zdce.h DCE zlib layer): zlib_test and zpipe.

• insmod fsl-dce-zspeed.ko in_file=/home/root/faketato num_threads=5

zpipe about to start test. The input file is /home/root/faketato. The output file is (null). The number of threads is 5. The number of loops is 1. The work unit size is 131072

Size of file: /home/root/faketato is 24736902

test wait produced 363493937

time according to jiffies = 28

1 loops took 0 secs. CPU Cycles 199441737. CPU frequency is 1799999982

total_total_in = 123684510

Rate is 8930211664 bps <- ~8.9Gbps

Finished test

Page 29: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE28 #NXPFTF

Demo: T4240 Development System

U-Boot 2016.012.0+g6dfe545 (Mar 30 2016 - 16:49:09 +0530)

CPU0: T4240E, Version: 2.0, (0x82480020)

Core: e6500, Version: 2.0, (0x80400120)

Clock Configuration:

CPU0:1666.667 MHz, CPU1:1666.667 MHz, CPU2:1666.667 MHz, CPU3:1666.667 MHz,

CPU4:1666.667 MHz, CPU5:1666.667 MHz, CPU6:1666.667 MHz, CPU7:1666.667 MHz,

CPU8:1666.667 MHz, CPU9:1666.667 MHz, CPU10:1666.667 MHz, CPU11:1666.667 MHz,

CCB:733.333 MHz,

DDR:933.333 MHz (1866.667 MT/s data rate) (Asynchronous), IFC:183.333 MHz

FMAN1: 733.333 MHz

FMAN2: 733.333 MHz

QMAN: 366.667 MHz

PME: 533.333 MHz

L1: D-cache 32 KiB enabled

I-cache 32 KiB enabled

Reset Configuration Word (RCW):

00000000: 16070019 18101916 00000000 00000000

00000010: 04022828 00558c00 ec020000 f5000000

00000020: 00000000 ee0000ee 00000000 000307fc

00000030: 00000000 00000000 00000000 00000028

I2C: ready

Board: T4240QDS, Sys ID: 0x1e, Sys Ver: 0x22, vBank: 4

FPGA: v3 (T4240QDS_2012_1113_1114), build 438 on Tue Nov 13 17:14:23 2012

SERDES Reference Clocks: SERDES1=125MHz SERDES2=125MHz SERDES3=100MHz SERDES4=100MHz

Page 30: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE29 #NXPFTF

Gzip vs. DCE: 20x Performance Improvement

• Test Input and output (in ramdisk)

-rw-r--r-- 1 root root 1.1G Apr 13 09:47 dce_webdata1G

-rw-r--r-- 1 root root 177M Apr 13 09:50 dce_webdata1G.gz

• Test Time

>time gzip dce_webdata1G VS >time insmod fsl-dce-zspeed.ko mode=0 num_threads=1 in_file=dce_webdata1G

real 1m58.001s real 0m5.016s

user 1m55.226s user 0m0.001s

sys 0m2.769s sys 0m3.456s

• CPU Utilization (mpstat)

Linux 3.12.37-rt51-01298-g283a93c-dirty (t4240qds) 04/13/16 _ppc64_ (24 CPU)

09:48:18 CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle

09:48:20 7 51.76 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00 47.24

09:48:22 7 98.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

09:48:24 7 97.50 0.00 2.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Or

Average: all 3.80 0.00 0.09 0.00 0.01 0.00 0.00 0.00 0.00 96.10

Average: 7 91.28 0.00 1.93 0.00 0.07 0.00 0.00 0.00 0.00 6.72

Vs

10:08:46 0 0.00 0.00 32.16 0.00 0.00 0.00 0.00 0.00 0.00 67.84

10:08:48 0 0.00 0.00 99.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00

10:08:50 0 0.00 0.00 41.11 0.00 7.22 0.00 0.00 0.00 0.00 51.67

Or

Average: all 0.01 0.00 1.78 0.00 0.08 0.01 0.00 0.00 0.00 98.12

Average: 0 0.00 0.00 43.26 0.00 1.80 0.00 0.00 0.00 0.00 54.94

Page 31: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE30 #NXPFTF

Summary

• The Decompression and Compression Engine (DCE) is an accelerator compatible with Data Path Architecture providing lossless data decompression and compression for the QorIQ family of SoCs.

• DCE is a perfect compliment for device that needs to optimize data storage and network bandwidth. DCE is standard compliant. It offers fast execution and free CPU to focus on other applications.

• Software Development Kit provides drivers and APIs for developer to adopt DCE in their application. Reference applications are available for user and kernel space application.

• NXP FTF Sessions:

− FTF-NET-N1878 Storage - Complete Storage Dedupe offload using NXP's Intelligent Storage Accelerator. Next gen SDS platform

• Demo

− QorIQ Leadership Security Functionality

• Further Reference

− QorIQ T4240 Data Path Acceleration Architecture (DPAA) Reference Manual (T4240DPAARM)

− Data Path Acceleration Architecture, Second Generation (DPAA2) Hardware Reference Manual (DPAA2RM)

Page 32: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE
Page 33: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

EXTERNAL USE32

Linux® Services

Integration

Services

Development Tools

Solutions

Reference

Runtime Products

Software Products and Services Visit us in the Tech Lab – #247

Deliver Commercial Software, Support, Services and Solutions

Create Success!

Simplify Software Engagement with NXP

Find us online at www.nxp.com/networking-services

Accelerate Customer Time-to-Market

• Security

Consulting

• Hardened

Linux

• IOT

Gateway

• OpenWRT+

• CodeWarrior• VortiQa Software

Solutions

• Commercial

Support• Performance Tuning

Page 34: INTRODUCE QorIQ HARDWARE COMPRESSION ENGINE

PUBLIC USE33 #NXPFTF

ATTRIBUTION STATEMENT

NXP, the NXP logo, NXP SECURE CONNECTIONS FOR A SMARTER WORLD, CoolFlux, EMBRACE, GREENCHIP, HITAG, I2C BUS, ICODE, JCOP, LIFE VIBES, MIFARE, MIFARE Classic, MIFARE

DESFire, MIFARE Plus, MIFARE FleX, MANTIS, MIFARE ULTRALIGHT, MIFARE4MOBILE, MIGLO, NTAG, ROADLINK, SMARTLX, SMARTMX, STARPLUG, TOPFET, TrenchMOS, UCODE, Freescale,

the Freescale logo, AltiVec, C 5, CodeTEST, CodeWarrior, ColdFire, ColdFire+, C Ware, the Energy Efficient Solutions logo, Kinetis, Layerscape, MagniV, mobileGT, PEG, PowerQUICC, Processor Expert,

QorIQ, QorIQ Qonverge, Ready Play, SafeAssure, the SafeAssure logo, StarCore, Symphony, VortiQa, Vybrid, Airfast, BeeKit, BeeStack, CoreNet, Flexis, MXC, Platform in a Package, QUICC Engine,

SMARTMOS, Tower, TurboLink, and UMEMS are trademarks of NXP B.V. All other product or service names are the property of their respective owners. ARM, AMBA, ARM Powered, Artisan, Cortex,

Jazelle, Keil, SecurCore, Thumb, TrustZone, and μVision are registered trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. ARM7, ARM9, ARM11, big.LITTLE, CoreLink,

CoreSight, DesignStart, Mali, mbed, NEON, POP, Sensinode, Socrates, ULINK and Versatile are trademarks of ARM Limited (or its subsidiaries) in the EU and/or elsewhere. All rights reserved. Oracle and

Java are registered trademarks of Oracle and/or its affiliates. The Power Architecture and Power.org word marks and the Power and Power.org logos and related marks are trademarks and service marks

licensed by Power.org. © 2015–2016 NXP B.V.