20
Jet Propulsion Laboratory California Institute of Technology 1 Implementation of a Digital Processing Subsystem for a Long Wavelength Array Station Robert Navarro 1 , Elliott Sigman 1 , Melissa Soriano 1 , Douglas Wang 1 , Larry D'Addario 1 , Joe Craig 2 and Steve Ellingson 3 2010 Jan 7 1. Jet Propulsion Laboratory, California Institute of Technology 2. University of New Mexico 3. Virginia Polytechnic Institute and State University Copyright 2010. All rights reserved.

Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Jet Propulsion Laboratory

California Institute of Technology

1

Implementation of a Digital Processing Subsystem for a Long

Wavelength Array Station

Robert Navarro1, Elliott Sigman1, Melissa

Soriano1, Douglas Wang1, Larry D'Addario1,

Joe Craig2 and Steve Ellingson3

2010 Jan 7

1. Jet Propulsion Laboratory, California Institute of Technology

2. University of New Mexico

3. Virginia Polytechnic Institute and State University

Copyright 2010. All rights reserved.

Page 2: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 2URSI 2010

LWA Overview

• Each station is an array of dipole-like elements in 100 m diameter aperture for FOV = [8,2]°

• 10-88 MHz tuning range

• Construction of 256 dipole elements for LWA1 station complete.

• Up to 52 “stations” planned - mJy-class sensitivity

• Access to both GC & important northern regions

• Important astrophysical & ionospheric science

State of New

Mexico, USA

• For more information, see the Proceedings of the IEEE paper on LWA (Ellingson, et al, Volume: 97 Issue: 8, Aug. 2009) http://www.ece.vt.edu/swe/lwa/memo/lwa0157.pdf

• Also visit the LWA web site: http://lwa.unm.edu/

Page 3: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 3URSI 2010

``

LWA Station: Simplified Block Diagram

...

...

512

Analog

Rcvrs:

gain

and

filtering

Antennas: 256 dual-

polarization dipole pairs

in a ~100m x ~100m

array, with integral LNAs.

512

12-bit

ADCs

...

Digital Signal Processing (DP)

...196 MHz

sampling

clock

beam

1

beam

2

beam

3

beam

4

Each beam:

• independently points

to and tracks any sky

direction

• to 19.6 MHz bandwidth

• 2 center frequencies

• 2 polarizations

Full-band

transient

buffer

(on

demand)

10-88 MHz

Full-sky,

narrow

band

Data Aggregation and

Communication

(or Data Recorder)

To array

correlator

Monitor/Control

Computer

To/from array

control

to/from all station subsystems

Station Equipment Shelter

JPL responsibility (this paper)

Page 4: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 4URSI 2010

Digital Signal Processing (DP) Subsystem

Digitizers:

26 boards,

10 antennas

or

20 channels

per board

Per-Antenna Processing:

• full-bandwidth delay tracking

and coherent summation for

each beam

• full-bandwidth transient

buffering

• narrow-bandwidth data

streaming

26 boards

20 channels/board

Per-Beam

ProcessingFilter, channelize

and format for

recording.

2 boards.

2 beams/board.

from

Analog

Receivers

512 chan.

Beam outputs:

4x 10GbE

one per 2-pol x 2-freq

beam

Transient buffer or

narrowband streaming

outputs:

26x 1GbE,

one per 10 antennas

DP Subsystem

Control Computer

network switch

to/from embedded PowerPC in

each of 28 processing boards

to/from station

monitor/control via MCS

networkOutputs to Data Aggregation &

Communication Subsystem

...

...

...

...

Page 5: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 5URSI 2010

Per-Antenna Processing (2 Polarizations)

A/D

Memory

Interface

CIC

Filter

Dec by N/8

RAM

32 MB

Beam 1

Submodule

~

X

XFIR

Filter

Dec by 8

CIC

Filter

Dec by N/8

FIR

Filter

Dec by 8

A/D

Beam 3

Submodule

Beam 2

Submodule

Beam 4

Submodule

CIC

Filter

Dec by N/8

~

X

XFIR

Filter

Dec by 8

CIC

Filter

Dec by N/8

FIR

Filter

Dec by 8

TBN

TBW

X Pol

Y Pol

To

Micro

Processor

TriggerTo

Micro

Processor

Partial Sum In

Partial Sum In

Partial Sum Out

Partial Sum In

Partial Sum In

12 bits

12 bits

Sin

Cos

Sin

Cos

NCO 10

to 88

MHz

NCO 10

to 88

MHz

To

Micro

Processor

Two Stage Low Pass Filter and

Decimate by N.

Output BW 1 KHz to 100 KHz

Two Stage Low Pass Filter and

Decimate by N. Output BW 1

kHz to 100 KHz

196

MHz

196

MHz

To

Micro

Processor

To

Micro

Processor

TBW

Submodule

TBN

Submodule

Each DP1 functionality

board includes 10 of these

blocks

Page 6: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 6URSI 2010

FIFOFrac Dly

FIR

Σ

FIFOFrac Dly

FIR

Σ

Partial Sums

to next Stand

196

MSamples/s

Y Pol

196

MSamples/s

X Pol

Coarse

Delay

Amplitude weighting

and broadband

polarization adjustment

12

12

12

12

20

20

Partial Sums

from previous Stand

196

MSa/s12

12

Y-P

OL

X-P

OL

Fine

Delay

x

+

x

x

x

+

196

MSa/s

Matrix Multiply

Beamformer Submodules

• Fractional Delay FIR filters could also contain extra coefficients for dispersion corrections.

• Matrix Multipliers would become FIR filters for frequency dependent polarization adjustments.

Page 7: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 7URSI 2010

Fine Delay Tracking

• Delay corrections across entire band (10 to 88 MHz) require accuracies of at least 1.28 nsec to keep maximum loss of synthesized beam to under 7%.

• Delay corrections across tuning band (19.6 MHz) require accuracies of at least 6 nsec to keep maximum loss of synthesized beam to under 7%.

• With multiple tunings possible, delay for entire band must be supported.

• At 196 MHz sampling rates, integer sample delays possible to 5.1 nsec accuracy.

• Sub-sample delay adjustments can be implemented using FIR filters.

• Desired band of 10-88 MHz covers 0.1 to

0.9 of normalized frequency band.

• 18 FIR taps keeps error under 0.03, 22

taps keeps error under 0.01 (-40 dB)

• 18 to 22 taps recommended for sub-sample

delay adjustments.

• FPGA hardware provides for up to 32 taps

per beam.

Page 8: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 8URSI 2010

"Transient Buffer" Submodules

A/D

Memory

Interface

CIC

Filter

Dec by N/8

RAM

32 MB

Beam 1

Submodule

~

X

XFIR

Filter

Dec by 8

CIC

Filter

Dec by N/8

FIR

Filter

Dec by 8

A/D

Beam 3

Submodule

Beam 2

Submodule

Beam 4

Submodule

CIC

Filter

Dec by N/8

~

X

XFIR

Filter

Dec by 8

CIC

Filter

Dec by N/8

FIR

Filter

Dec by 8

TBN

TBW

X Pol

Y Pol

To

Micro

Processor

TriggerTo

Micro

Processor

Partial Sum In

Partial Sum In

Partial Sum Out

Partial Sum In

Partial Sum In

12 bits

12 bits

Sin

Cos

Sin

Cos

NCO 10

to 88

MHz

NCO 10

to 88

MHz

To

Micro

Processor

Two Stage Low Pass Filter and

Decimate by N.

Output BW 1 KHz to 100 KHz

Two Stage Low Pass Filter and

Decimate by N. Output BW 1

kHz to 100 KHz

192

MHz

192

MHz

To

Micro

Processor

To

Micro

Processor

Data from Digitizers

12b, 196 MHzX Y X Y

Wideband Transient Buffer

Narrowband Transient Buffer

TBW

57 msec recording at 2x12 b/sample.

1000:1 duty cycle.

TBN

Continuous readout at 2x12 b/sample

and 100 kHz bandwidth.

Page 9: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 9URSI 2010

Per-Beam Processing (Digital Receivers)

• Per-beam processing uses same board as per-antenna processing with different code.• Each board processes two beams. Each beam includes 4 downconverters, 2 per polarization.• Board input rate: 31.2 Gb/s of beamformed data for four beams.• DP2 Boards outputs DRX data for 4 beams, 2 pol, 2 tunings. • Only two of five FPGA’s used. Rest available for future expansion.

Complex

Multiply

CIC

Filter

FIR

Filter

CIC

Filter

FIR

Filter

~NCO- Tuning 1

Sin Cos

Filter Bank

I Sum

Q Sum

Low Pass Filter & Decimation

Bandwidth= 0.4 MHz to 19.6

MHz

4096 Sub-

Bands

I

Q Q

I

Complex

Multiply

CIC

Filter

FIR

Filter

CIC

Filter

FIR

Filter

~NCO – Tuning 2

Sin Cos

Filter Bank

Low Pass Filter & Decimation

Bandwidth= 0.4 MHz to 19.6

MHz

4096 Sub-

Bands

I

Q Q

I

To Data

Recorders

(10GbE)

Beam Input

(X or Y

Pol)

Page 10: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 10URSI 2010

Processing Board Overview

• The Processing Board is the main digital signal processing hardware for the

Long Wavelength Array project.

• Uses:– As a Digital Beamformer: For each of 10 antennas (20 channels), combines sample

streams for two polarizations into 4 independently-steerable beams.

– As a Digital Receiver: For each beam, two independent 19.2 MHz bands are selected and

channelized into 4096 contiguous channels.

• Form Factor:– 20 Layer 322 by 280 mm board for ATCA chassis. Chassis holds 14 boards.

– Digitizer Board implemented as rear transition module connecting to Processing Boards

that are used for per-antenna processing. Provides separation between analog and

digital circuits.

• Board Statistics and Parts:– Uses five XC5VSX50T FPGAs.

– Uses one PPC440EPx embedded processor.

– Uses ten 512 Mbit DDR2 DRAM for Wideband Transient Buffers.

• Inputs/Outputs:– 20 ADC Inputs: Each ADC is provides 12 bits at 196 MHz, sent over six differential pairs

DDR. Sampling clock signal accompanies ADC data.

– Processor Interface IO: Two Gigabit Ethernet Ports, 1 RS232 port. JTAG port available

for debugging and PROM programming.

– Four 10GbE ports using CX4 connectors.

– ATCA Chassis Zone2 backplane: One Xaui (4 Rocket IO) input/output connection to

every other board in the chassis.

Page 11: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 11URSI 2010

Processing Board: FPGA Centric View

SX50T

(1)

SX50T

(3)

SX50T

(5)

SX50T

(4)SX50T

(2)

beam1

beam2

beam4

beam3

Rocket IO

Diff Pairs

32

32

32

32

To Front

Panel

Connector

To Front

Panel

Connector

To

Backplane

Full Mesh

Fabric

Via

CrossBar

24

4 A/D inputs

DDR

4 A/D inputs

DDR

4 A/D inputs

DDR

4 A/D inputs

DDR

4 A/D inputs

DDR

2424 24

24

Daisy

Chain

To SX50T

(1)32

32

Beam

Daisy

Chain

(in sets of 4)4

4

4

4

4

4

4

4

4

44

addr 21

cntrl 7

Program 4

DRAM 66

diff pairs 320

clks 8

Total pins 458

Used pins 480

unused pins 22

DRAM

4

44

4

44

DRAMDRAMDRAM DRAM

Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4)

Each Beam has 8512 Mbits/sec. Each Pair also needs 1 sync diff pair bit.

Each SX50T inputs 2 stands or 4 A/D inputs. Assuming each A/D has 12 bits, this would take

48 differential pairs. By double clock, the number of pairs is reduced to 24.

Total of 128+24+8 = 160 diff pairs per chip. Still need 3 differential clocks (192 MHz and 156.25

MHz and 133 MHz). Also, need PPC EBC interface (about 40 single ended pins).

PPC440Epx

Page 12: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 12URSI 2010

Processing Board Photo

Page 13: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 13URSI 2010

Digitizing Board

• Digitizing Board has 20 ADC chips (AD9230BCPZ-210).

– 12 bit samples [ENOB of 10.4 @ fIN up to 70 MHz @ 250 MSPS (−1.0 dBFS)]

– Sampled at 196 MHz

– 700 MHz analog input bandwidth

– SNR = 64.9 dBFS @ fIN up to 70 MHz @ 250 MSPS

• Implemented on ATCA chassis rear-transition module board.

• Analog functionality separated from digital processing through ATCA zone 3 connector.

• 16 layer PCB with ground planes between signal layers to support impedance control

• Input signals received differentially over CAT-7 cable on RJ-45 connectors. Each cable handles 4 channels.

• One additional RJ-45 connector carries 196 MHz and 1PPS clocks.

• ADC chips configured by PPC processor through SPI bus on Zone 3 connector.

Page 14: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 14URSI 2010

Software Development

• Monitor & Control Software

– Top level Monitor & Control Software (MCS) developed at Virginia Tech. Digital Processing Subsystem Control Computer software developed at JPL.

– Top level MCS Software sends configuration commands to dedicated the Digital Subsystem Control Computer and receives status messages.

– The Digital Subsystem Control Computer talks also interfaces to all 28 digital Processing boards via their embedded PPC440EPx processors.

– The MCS computer and Digital Subsystem Control Computer are Linux based machines.

– JPL Digital Subsystem software will initially support TBW (wideband transient capture) command. The TBW command will specify a trigger time, at which the TBWs will begin acquiring data at 2x12 bits per sample until the 128 MB RAM is full.

– Full beamforming software functionality developed later in conjunction with FPGA firmware.

• Embedded Processor Software

– Processing Board runs Debian/GNU Linux 5.0 (lenny) kernel and filesystem with PPC440EPx processor with Gigabit Ethernet network connectivity.

– Xilinx FPGA’s communicate with PPC440EPx as memory mapped devices through processor embedded bus controller (EBC) interface.

– Xilinx FPGA’s programmed through EBC interface and GPIO lines.

– Custom Linux drivers allow user mode access of EBC and GPIO lines for programming of FPGA and memory mapped data transfers.

– Simple Linux command line programs allow Xilinx programming and data access : cfgxil, xrl, xwl.

– The two test pattern generators (TEST_DRAM_IN, TEST_ADC_IN) used in combination with the TBW command for initial DP Board testing.

Page 15: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 15URSI 2010

FPGA Firmware Development

• First Version FPGA firmware developed- Provides interface to processor, DRAM, ADC chips, FPGA to FPGA diff pairs and high

speed serial line test connectivity

- Provides wide-band transient buffer functionality

- Future Beamforming functionality shown in grey

XC5VSX50T

ADC DATA & CLK ADC

Interface

TEST_ADC_IN

1PPS

Timer

1PPS

TBW

Control

SDRAM

Interface

PPC Interface

System

Monitor

MGT

Interface

CLK_156.25MHz

CLK_196MHz

TBN

Control

SYS_CLKBFU

MGT

Control

PDB

Interface

SDRAM Control & Data

Beam Rear

Beam Front

PPC Control & Data

Developed and

verified in silicon

Developed and

verified in simulation

To be developed

Clock

Generator

DP2 CLOCK

Page 16: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 16URSI 2010

Current Status of Digital Processing Subsystem

• Prototype Processing Board fabricated – in test and verification stage– Board power infrastructure verified

– Embedded processor JTAG, memory and boot flash verified

– FPGA programming thru JTAG & processor interface verified

– FPGA to DRAM (TBW) connectivity verified.

– Gigabit Ethernet connectivity verified

– Linux Debian operating system running.

• Prototype Digitizing Board fabricated – First samples captured– Board power infrastructure verified

– Communication with ADC chips configuration bus verified.

– Samples captured and successfully passed to Processing board over Zone 3 connector.

• Initial FPGA firmware developed– Provides interface to processor, DRAM, ADC chips, FPGA to FPGA diff pairs and high

speed serial line test connectivity.

– EBC and DRAM functionality verified in hardware.

– Provides wide-band transient buffer (TBW) functionality

• Embedded processor software– Uboot and Embedded Linux from similar PowerPC based platforms (ROACH, Sequoia)

modified for this platform.

– Drivers for user mode access of EBC, GPIO and I2C interfaces developed.

Page 17: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 17URSI 2010

Backup slides follow

Page 18: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 18URSI 2010

DP MCS and DP Network Switch Selection

• DP MCS Computer

– Dell PowerEdge 2970 2U rackmount server with

– Quad Core AMD Opteron™ 2372HE 2.1GHz 4x512K Cache

– 4GB DDR2, 4x1GB Single Ranked DIMMs

– 500GB SATA Hard Drive

– Intel PRO 1000PT 1GbE Dual Port NIC

– Rack Chassis w/Sliding Rapid/Versa Rails

• DP Network Switch

– Fujitsu (FUJ92MH) 48 Port 10/100/1000BASE-T Switch

– Fujitsu (SJ10GCX4A) Dual Port 10 GbE CX4 Uplink Card

Page 19: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 19URSI 2010

Inter-Module Connections

• Connections from each DP1 to the next and from last DP1 to DP2 are needed only for the beam partial sums.

• 8 partial sums (4 beams, 2 polarizations) pass through 26 modules, thus requiring 204 inter-module signals.

• Allowing for maximum bit growth due to accumulation of 256 antenna inputs, each complete beam sum signal consists of 20b samples for two polarizations at 196 MSa/sec = 7480 Mbps.

• Might want to only allow for beam bit growth to 16 bits.

• Xilinx RocketIO serial links on Virtex-5 FPGAs provide an efficient interconnection mechanism.

• Each beam, with two polarizations, will require four RocketIO serial links.

• All intra-chassis beam partial sums will be connected without cables through the ATCA full mesh backplane.

• Inter-chassis connections will be accomplished using CX4 cables (10GE style), 1 per beam.

Page 20: Robert Navarro , Elliott Sigman Soriano and Steve Ellingson · Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4) Each Beam has 8512

Long Wavelength Array

. 20URSI 2010

Using ATCA backplane for DP Beam Daisy Chain