Upload
hoangdung
View
221
Download
4
Embed Size (px)
Citation preview
External Use
TM
Overview of the QuadSPI Protocol
Supported by QorIQ
Communications Processors
FTF-NET-F0008
A P R . 2 0 1 4
Jimmy Zhao | PE, System and Applications Engineering
TM
External Use 1
Session Introduction
• Introduction of Quad Serial Peripheral Interface (QuadSPI)
− Current Trend for Serial Flash Memory
• Help you determine
− Whether the QuadSPI interface is the right solution for your system
− How to use the QuadSPI
• Time allocation
− 45 - 50 min presentation
− 10 min Q&A
• Author
− Jimmy Zhao, PE, System and Application Engineer, Digital Networking
− SME for USB, SPI, SDHC
TM
External Use 2
Session Objectives
• After completing this session you will be able to:
− Understand the current trend for serial flash memory
− Know LS1 QuadSPI information, should it be on your system
− Understand QuadSPI controller’s
Features and capabilities
Different operation modes to reduce power
Programmable Sequence Engine
Parallel Mode
− Know how to use QuadSPI in different data paths: single, dual, quad
TM
External Use 3
Agenda
• LS1 QuadSPI information
• SPI Memory Background
• QuadSPI Introduction
− Features
− Architecture
− Buffers
− Programmable sequence engine
− Serial Flash Memory (SFM) commands/programming
− Parallel Mode
− Programmable sequence engine examples
− Data sampling
TM
External Use 4
LS1 QuadSPI information
• Dual QuadSPI architecture supports: − Two external Serial Flashes per QuadSPI
module
− Programmable Sequence Engine for compatibility to any Serial flash
− XIP (Execute-In-Place)
− Supports up to 4 chip selects
• Up to 83 MHz clock
• Flexible Receive (RX) Buffering Scheme: − Sub-buffers allocated to specific masters.
− Master prioritisation
− Pre-fetch capability
− Suspend & resume for lower priority masters
• Boot from QuadSPI
TM
External Use 5
SPI Memory Background
• Serial Peripheral Interface (Flash devices) : − Communications interface between CPU and external flash memory
− Interface similar to standard SPI but optionally utilizes 2 (Dual) or 4 (Quad) data lines to transfer
− Can also support DDR (Double Data Rate) mode to further increase throughput
− Command-driven interface
− Supports both 24-bit and 32-bit addressing.
− Bandwidth: Up to 333 MB/s
− Future: OctalSPI (8 data lines) flash devices coming
• Use Cases − Boot code
− Shadowing for Store and Download systems
− XIP
− Low-end Data storage
TM
External Use 6
SPI Memory Background
• No industrial standard – de facto Motorola SPI
• Most key commands are common to all manufacturers
• Memory flash device suppliers:
• The QuadSPI module is flexible enough to support all commands
from all manufacturers
− Using its flexible core engine
(Acquired )
TM
External Use 7
Agenda
• LS1 QuadSPI information
• SPI Memory Background
• QuadSPI Introduction
− Features
− Architecture
− Buffers
− Programmable sequence engine
− Serial Flash Memory (SFM) commands/programming
− Parallel Mode
− Programmable sequence engine examples
− Data sampling
TM
External Use 8
QuadSPI Introduction
• Dual QuadSPI architecture
− Two external Flashes per QuadSPI module
− Programmable Sequence Engine for compatibility to any serial flash
− Support up to four chip selects
− Dual-die support
• Control two four-bit serial flashes
− Individual Mode
− Parallel Mode enabling “octal flash” with data combination internally (Read only)
• Single, dual, and quad mode (octal on the way)
• DDR mode (Double Data Rate, transferring data on both edges of the clock)
• DMA support (Direct Memory Access)
TM
External Use 9
QuadSPI Features
• Serial Flash Mode
− Data access over the AHB bus
or as a peripheral
− Flash is memory mapped and
all read accesses are
automatically carried out
− Read, write, and erase can be
implemented over peripheral
bus
− DMA and interrupt support to
read RX buffer data via AHB
bus or IP (Freescale Intellectual
Property Interface) register
space AMBA: Advanced Microcontroller
Bus Architecture
AHB: AMBA High-performance Bus
SFM: Serial Flash Memory Sflash clock domain
Host clock domain
TM
External Use 10
QuadSPI Operation Modes
• Normal Mode
− Allowed to communicate with an external serial flash device
• Module Disable Mode
− The clock to the non-memory mapped logic can be stopped
• Stop Mode
− The system clocks to the QuadSPI block may be shut off
TM
External Use 11
External Signals (Compare to eSPI)
PCS: (Peripheral) Chip Select
FA: Flash A, FB: Flash B
DQS: Data strobe
Signals QuadSPI eSPI
Chip Select
QSPI_CS_A0
QSPI_CS_A1
QSPI_CS_B0
QSPI_CS_B1
SPI_CS_B0
SPI_CS_B1
SPI_CS_B2
SPI_CS_B3
Clock QSPI_CK_A
QSPI_CK_B
SPI_CLK
Data I/O QSPI_DIO_A[3:0]
QSPI_DIO_B[3:0]
SPI_MOSI
SPI_MISO
Data Strobe QSPI_DQS_A
QSPI_DQS_B
TM
External Use 12
Block Diagrams
• Quad Bit Data Path
IO1
IO0
IO3
IO2
TM
External Use 13
Programmable Sequence Engine
• Works on a set of “instruction-operand” pair programmed by the user
• Configure the QuadSPI module according to the serial flash from different
vendors
• Sequentially executes the “instructions” to carry out the required task
• Drives the flexible I/O controller to generate serial flash patterns
6 bits
2 bits
8 bits
INSTR OPERAND PADs
TM
External Use 14
Programmable Sequence Engine- Look-up-table (LUT)
• The sequences for a particular flash
on board stored in an LUT
• LUT consists of
− up to 16 pre-programmed
sequences
− up to 8 instruction-operand pairs
• Every buffer register and the IPS
configuration register
=> a sequence in the LUT
• Writing the QSPIn_IPCR[SEQID]
field with a LUT index will trigger
execution
dual i/o read
quad i/o read
quad i/o xip read
fast read
2 x i/o DTR read
4 x i/o DTR read
seq_id_buf0
seq_id_buf1
seq_id_buf2
seq_id_buf3
LUT
IPS_seq_id
4 x i/o page program
TM
External Use 15
Programmable Sequence Engine (cont.)
• At reset, there is one sequence programmed that reads 8 Bytes of data from the serial flash using a 24-bit address and on a single I/O:
• The user must pre-populate the LUT
• The LUT may be locked to protect its contents from being changed
• The process for locking and unlocking the LUT:
− Locking the LUT Write the key (0x5AF05AF0) to QSPI_LUTKEY
Write 0b01 to QSPI_LCKCR
− Unlocking the LUT Write the key (0x5AF05AF0) to QSPI_LUTKEY
Write 0b10 to QSPI_LCKCR
(Note that the transactions should immediately follow each other)
Instruction Pad Operand Comment
CMD 0x0 0x03 Read Data byte command on one pad
ADDR 0x0 0x18 24 Addr bits to be sent on one pad
READ 0x0 0x08 Read 64 bits
JMP_ON_CS 0x0 0x00 Jump to instruction 0 (CMD)
TM
External Use 16
Programmable Sequence Engine: Instruction set
Instruction Pads Operand Action
CMD
N = {1,2,4}
Command value Provide the SF with the operand on N pads
ADDR Number of address bits Provide the SF with address cycles according to the operand on N
pads. (address can be memory mapped or register mapped)
ADDR_DDR Number of address bits
Provide the SF with address cycles according to the operand on N
pads in DDR mode. (address can be memory mapped or register
mapped)
DUMMY Number of dummy cycles Provide the SF with dummy cycles according to operand
MODE Mode value Provide the SF with operand on N pads
MODE_DDR Mode value Provide the SF with operand on N pads in DDR mode
READ Read data size Read the data from the SF via N pads
READ_DDR Read data size Read the data from the SF in DDR mode via N pads
JUMP_ON_CS Instruction number Every time the CS is deasserted, jump to the instruction index
specified by the operand
WRITE Write data size Write the data to the SF on N pads
WRITE_DDR Write data size Write the data to the SF in DDR mode on N pads
TM
External Use 17
Programmable Sequence Engine: Example
• Fast Read (Macronix/Numonyx/Spansion/Winbond)
S.No Instruction Pads Operand Comment
1 CMD 0 0x0B Fast Read command = 0x0B
2 ADDR 0 0x18 24 address bits
3 MODE 0 0x08 Dummy cycles
4 DUMMY 0 0x04 Read 32 bits on 1 pad
5 JUMP_ON_CS 0 0x00 Jump to instruction 0 (CMD)
TM
External Use 18
Programmable Sequence Engine: Example
• 4 x I/O Read for Macronix
S.No Instruction Pads Operan
d
Comment
1 CMD 0x0 (1) 0xEB EB = 4xI/O read command
2 ADDR 0x2 (4) 0x18 24 Addr bits to be sent on 4 pads
3 MODE 0x2 (4) 0xA5 Performance enhance mode
4 DUMMY 0x0 0x4 4 Dummy cycles
5 READ 0x2 (4) 0x04 Read 32 Bits on 4 pads
6 JUMP_ON_CS 0x0 0x01 Jump to instruction 1 (ADDR)
TM
External Use 19
Serial Flash Memory (SFM) Commands
1. Populate the LUT
2. Start executing the instructions
− IP Commands
QSPI_SFAR: Read/write Address
QSPI_IPCR: Data size, Sequence ID (SEQID)
− AHB Commands
QSPI_BUFxCR: SEQID
QSPI_SR[BUSY]: 1
3. Communication with flash started
4. Transaction finished
QSPI_SR[BUSY]: 0
QSPI_FR[TFF]: 1 for an IP command
TM
External Use 20
Flash Programming
1. Populate the LUT
2. Clear TX buffer
− QSPI_MCR[CLR_TXF] =1 if QSPI_SR[TXNE] = 1
3. Program QSPI_SFAR
4. Write data to QSPI_TBDR => TX buffer
5. Program QSPI_IPCR -- SEQID = index of LUT
6. Keep writing data to QSPI_TBDR to finish
− QSPI_TBSR[TRCTR]: Tx counter
• Tx Buffer
16 entries (4 bytes)
Circular FIFO
TM
External Use 21
Flash Read
A. Reading data into QuadSPI
− IP Command Read => Rx Buffer
1. Populate the LUT
2. Program QSPI_SFAR
3. Program QSPI_IPCR -- SEQID = index of LUT
QSPI_SR[IP_ACC] =1 (busy)
− AHB Command Read => AHB buffer
1. Populate the LUT
2. Setup an address range mapped to an external flash device
• QSPI_AMBA_BASE, TOP_ADDR_MEMA2
3. QSPI_BUFxCR => MSTRID
TM
External Use 22
Flash Read (Continued)
B. Data Transfer from QuadSPI internal buffer
− Rx Buffer – 32 entries (4 bytes)
QSPI_RBDR0 – 31 = QSPI_ARDB0 – 31
QSPI_RBCT[WMRK] => (water mark+1)
− Flag-based data read: QSPI_SR[RXWE] = 1
Write 1 to QSPI_FR[RBDF]
− DMA controller data read: QSPI_RSER[RBDDE]=1
QSPI_SR[RXWE] = 1
− AHB Buffer data read via memory mapped access
TM
External Use 23
Address Scheme
• Regular 24-bit address
• 32-bit addressing
− Extended address mode
Convert a 24-bit command to a 32-bit address command
ADDR/ADDR_DDR = 32
− Extended address register
Extended address register: Upper 8-bit of the 32-bit address
Banks of 16 MB
Need a command to change the upper 8-bit
TM
External Use 24
Memory Mapped Serial Flash Data
AMBA_BASE
SFA1AD
A1
A2
B1
B2
SFA2AD
SFB1AD
SFB2AD
0x00_0000
Flash A Address
0x00_0004
…
SFA1AD - 4 0x00_0000 0x00_0004
SFA2AD - 4
0x00_0004
…
SFB1AD - 4 0x00_0000 0x00_0004
SFB2AD - 4
0x00_0000
Flash B Address
Memory Mapped Address
AMBA_BASE+0x00 AMBA_BASE+0x04
…
…
SFA1AD - 0x04 SFA1AD +0x00
SFA2AD - 0x04 SFA2AD +0x00
SFB1AD - 0x04 SFB1AD +0x00
SFB2AD - 0x04
…
…
…
Flash Memory
Max density on market: 1Gb
TM
External Use 25
Flexible Multi-Master Access
• Reduce read latency from serial flash
• 4 flexible circular buffers (merged into one)
• Each buffer − having its own rd/wr pointers
− having its own base address
− associated with 1 AHB master
− A datasize: amount of data to be fetched on every “missed” access
• Buffer3: Option of being associated with no master (default access to all masters not associated with any buffer)
• Buffer 0: Can be configured in a high-priority mode
Paramatrizable
max size
BUF0IND
BUF1IND
BUF2IND
Buffer 0
Buffer 1
Buffer 2
Buffer 3
TM
External Use 26
QuadSPI Parallel Access Mode
• This mode allows two identical serial flash devices to be connected and accessed in parallel, forming one (virtual) flash memory with doubled readout bandwidth
• Parallel Flash Mode is valid only for commands related to data read from the serial flash. Writes still occur on 4-bit basis, so when programming, data must be split
• Configuration in dual-die parallel mode: − The 1st device of Flash A has to be
paired with the 1st device of Flash B
− The 2nd device of Flash A has to be
paired with the 2nd device of Flash B
TM
External Use 27
Parallel Mode
TM
External Use 28
Parallel Mode
• Increase throughput 2 times
• Only Read operations allowed
• The same size:
− A1=B1
− A2=B2
• An example (256MB):
− A1/B1 pair Flash Address = (Memory mapped address - QSPI_AMBA_BASE)/2
Incoming address: 0x1000_0000
Flash address: 0x1000_0000 -> 0x0, (The 1st address of flash A1 & B1)
0x1000_0004 -> 0x02
0x1000_0008 -> 0x04, etc.
− A2/B2 pair Flash Address = (Memory mapped address - SFA2AD)/2
Incoming address: 0x3000_0000
Flash address: 0x3000_0000 -> 0x0, (The 1st address of flash A2 & B2)
0x3000_0004 -> 0x2, etc.
TM
External Use 29
QuadSPI Data Sampling Delay
• Flash samples incoming write data on postive edge of flash clock
• Flash generates outgoing read data on negative edge of flash clock
• QuadSPI internal clock is invert of serial flash clock
• Sampling Register (QuadSPI_SMPR[FSDLY, HSDLY, FSPHS, HSPHS])
TM
External Use 30
QuadSPI Data Sampling: SDR Mode
• Sampling Register (QuadSPI_SMPR[FSDLY, HSDLY, FSPHS, HSPHS])
TM
External Use 31
QuadSPI Data Sampling: DDR Mode
• Sampling Register (QuadSPI_SMPR[DDRSMP])
TM
External Use 32
Session Summary
• Gave the serial flash memory background information and their
suppliers
• Introduced QuadSPI information on LS1
• Discussed QuadSPI controller’s
− Features and capabilities
− Different operation modes to reduce power
− Programmable Sequence Engine
− Memory maps
− Parallel Mode
• Gave examples for programmable sequence engine
• Gave data sampling examples for SDR and DDR modes
TM
External Use 33
For Further Information
• URLs
− http://www.freescale.com/webapp/sps/site/prod_summary.jsp?code=LS1
021A&nodeId=018rH325E4017B
− http://www.spansion.com/Products/memory/Serial-
Flash/Pages/Spansion%20FL.aspx
− http://www.micron.com/products/nor-flash/serial-nor-flash
− http://www.winbond.com/hq/enu/ProductAndSales/ProductLines/FlashM
emory/SerialFlash/
− http://www.macronix.com/CachePages/en-us-Product-NORFlash-
SerialFlash.aspx#1Gb
• Contact information
− Jimmy Zhao, PE, System and Applications Engineering, DN
TM
External Use 34
Introducing The
QorIQ LS2 Family
Breakthrough,
software-defined
approach to advance
the world’s new
virtualized networks
New, high-performance architecture built with ease-of-use in mind Groundbreaking, flexible architecture that abstracts hardware complexity and
enables customers to focus their resources on innovation at the application level
Optimized for software-defined networking applications Balanced integration of CPU performance with network I/O and C-programmable
datapath acceleration that is right-sized (power/performance/cost) to deliver
advanced SoC technology for the SDN era
Extending the industry’s broadest portfolio of 64-bit multicore SoCs Built on the ARM® Cortex®-A57 architecture with integrated L2 switch enabling
interconnect and peripherals to provide a complete system-on-chip solution
TM
External Use 35
QorIQ LS2 Family Key Features
Unprecedented performance and
ease of use for smarter, more
capable networks
High performance cores with leading
interconnect and memory bandwidth
• 8x ARM Cortex-A57 cores, 2.0GHz, 4MB L2
cache, w Neon SIMD
• 1MB L3 platform cache w/ECC
• 2x 64b DDR4 up to 2.4GT/s
A high performance datapath designed
with software developers in mind
• New datapath hardware and abstracted
acceleration that is called via standard Linux
objects
• 40 Gbps Packet processing performance with
20Gbps acceleration (crypto, Pattern
Match/RegEx, Data Compression)
• Management complex provides all
init/setup/teardown tasks
Leading network I/O integration
• 8x1/10GbE + 8x1G, MACSec on up to 4x 1/10GbE
• Integrated L2 switching capability for cost savings
• 4 PCIe Gen3 controllers, 1 with SR-IOV support
• 2 x SATA 3.0, 2 x USB 3.0 with PHY
SDN/NFV
Switching
Data
Center
Wireless
Access
TM
External Use 36
See the LS2 Family First in the Tech Lab!
4 new demos built on QorIQ LS2 processors:
Performance Analysis Made Easy
Leave the Packet Processing To Us
Combining Ease of Use with Performance
Tools for Every Step of Your Design
TM
© 2014 Freescale Semiconductor, Inc. | External Use
www.Freescale.com