Upload
others
View
0
Download
0
Embed Size (px)
Citation preview
Jet Propulsion Laboratory
California Institute of Technology
1
Implementation of a Digital Processing Subsystem for a Long
Wavelength Array Station
Robert Navarro1, Elliott Sigman1, Melissa
Soriano1, Douglas Wang1, Larry D'Addario1,
Joe Craig2 and Steve Ellingson3
2010 Jan 7
1. Jet Propulsion Laboratory, California Institute of Technology
2. University of New Mexico
3. Virginia Polytechnic Institute and State University
Copyright 2010. All rights reserved.
Long Wavelength Array
. 2URSI 2010
LWA Overview
• Each station is an array of dipole-like elements in 100 m diameter aperture for FOV = [8,2]°
• 10-88 MHz tuning range
• Construction of 256 dipole elements for LWA1 station complete.
• Up to 52 “stations” planned - mJy-class sensitivity
• Access to both GC & important northern regions
• Important astrophysical & ionospheric science
State of New
Mexico, USA
• For more information, see the Proceedings of the IEEE paper on LWA (Ellingson, et al, Volume: 97 Issue: 8, Aug. 2009) http://www.ece.vt.edu/swe/lwa/memo/lwa0157.pdf
• Also visit the LWA web site: http://lwa.unm.edu/
Long Wavelength Array
. 3URSI 2010
``
LWA Station: Simplified Block Diagram
...
...
512
Analog
Rcvrs:
gain
and
filtering
Antennas: 256 dual-
polarization dipole pairs
in a ~100m x ~100m
array, with integral LNAs.
512
12-bit
ADCs
...
Digital Signal Processing (DP)
...196 MHz
sampling
clock
beam
1
beam
2
beam
3
beam
4
Each beam:
• independently points
to and tracks any sky
direction
• to 19.6 MHz bandwidth
• 2 center frequencies
• 2 polarizations
Full-band
transient
buffer
(on
demand)
10-88 MHz
Full-sky,
narrow
band
Data Aggregation and
Communication
(or Data Recorder)
To array
correlator
Monitor/Control
Computer
To/from array
control
to/from all station subsystems
Station Equipment Shelter
JPL responsibility (this paper)
Long Wavelength Array
. 4URSI 2010
Digital Signal Processing (DP) Subsystem
Digitizers:
26 boards,
10 antennas
or
20 channels
per board
Per-Antenna Processing:
• full-bandwidth delay tracking
and coherent summation for
each beam
• full-bandwidth transient
buffering
• narrow-bandwidth data
streaming
26 boards
20 channels/board
Per-Beam
ProcessingFilter, channelize
and format for
recording.
2 boards.
2 beams/board.
from
Analog
Receivers
512 chan.
Beam outputs:
4x 10GbE
one per 2-pol x 2-freq
beam
Transient buffer or
narrowband streaming
outputs:
26x 1GbE,
one per 10 antennas
DP Subsystem
Control Computer
network switch
to/from embedded PowerPC in
each of 28 processing boards
to/from station
monitor/control via MCS
networkOutputs to Data Aggregation &
Communication Subsystem
...
...
...
...
Long Wavelength Array
. 5URSI 2010
Per-Antenna Processing (2 Polarizations)
A/D
Memory
Interface
CIC
Filter
Dec by N/8
RAM
32 MB
Beam 1
Submodule
~
X
XFIR
Filter
Dec by 8
CIC
Filter
Dec by N/8
FIR
Filter
Dec by 8
A/D
Beam 3
Submodule
Beam 2
Submodule
Beam 4
Submodule
CIC
Filter
Dec by N/8
~
X
XFIR
Filter
Dec by 8
CIC
Filter
Dec by N/8
FIR
Filter
Dec by 8
TBN
TBW
X Pol
Y Pol
To
Micro
Processor
TriggerTo
Micro
Processor
Partial Sum In
Partial Sum In
Partial Sum Out
Partial Sum In
Partial Sum In
12 bits
12 bits
Sin
Cos
Sin
Cos
NCO 10
to 88
MHz
NCO 10
to 88
MHz
To
Micro
Processor
Two Stage Low Pass Filter and
Decimate by N.
Output BW 1 KHz to 100 KHz
Two Stage Low Pass Filter and
Decimate by N. Output BW 1
kHz to 100 KHz
196
MHz
196
MHz
To
Micro
Processor
To
Micro
Processor
TBW
Submodule
TBN
Submodule
Each DP1 functionality
board includes 10 of these
blocks
Long Wavelength Array
. 6URSI 2010
FIFOFrac Dly
FIR
Σ
FIFOFrac Dly
FIR
Σ
Partial Sums
to next Stand
196
MSamples/s
Y Pol
196
MSamples/s
X Pol
Coarse
Delay
Amplitude weighting
and broadband
polarization adjustment
12
12
12
12
20
20
Partial Sums
from previous Stand
196
MSa/s12
12
Y-P
OL
X-P
OL
Fine
Delay
x
+
x
x
x
+
196
MSa/s
Matrix Multiply
Beamformer Submodules
• Fractional Delay FIR filters could also contain extra coefficients for dispersion corrections.
• Matrix Multipliers would become FIR filters for frequency dependent polarization adjustments.
Long Wavelength Array
. 7URSI 2010
Fine Delay Tracking
• Delay corrections across entire band (10 to 88 MHz) require accuracies of at least 1.28 nsec to keep maximum loss of synthesized beam to under 7%.
• Delay corrections across tuning band (19.6 MHz) require accuracies of at least 6 nsec to keep maximum loss of synthesized beam to under 7%.
• With multiple tunings possible, delay for entire band must be supported.
• At 196 MHz sampling rates, integer sample delays possible to 5.1 nsec accuracy.
• Sub-sample delay adjustments can be implemented using FIR filters.
• Desired band of 10-88 MHz covers 0.1 to
0.9 of normalized frequency band.
• 18 FIR taps keeps error under 0.03, 22
taps keeps error under 0.01 (-40 dB)
• 18 to 22 taps recommended for sub-sample
delay adjustments.
• FPGA hardware provides for up to 32 taps
per beam.
Long Wavelength Array
. 8URSI 2010
"Transient Buffer" Submodules
A/D
Memory
Interface
CIC
Filter
Dec by N/8
RAM
32 MB
Beam 1
Submodule
~
X
XFIR
Filter
Dec by 8
CIC
Filter
Dec by N/8
FIR
Filter
Dec by 8
A/D
Beam 3
Submodule
Beam 2
Submodule
Beam 4
Submodule
CIC
Filter
Dec by N/8
~
X
XFIR
Filter
Dec by 8
CIC
Filter
Dec by N/8
FIR
Filter
Dec by 8
TBN
TBW
X Pol
Y Pol
To
Micro
Processor
TriggerTo
Micro
Processor
Partial Sum In
Partial Sum In
Partial Sum Out
Partial Sum In
Partial Sum In
12 bits
12 bits
Sin
Cos
Sin
Cos
NCO 10
to 88
MHz
NCO 10
to 88
MHz
To
Micro
Processor
Two Stage Low Pass Filter and
Decimate by N.
Output BW 1 KHz to 100 KHz
Two Stage Low Pass Filter and
Decimate by N. Output BW 1
kHz to 100 KHz
192
MHz
192
MHz
To
Micro
Processor
To
Micro
Processor
Data from Digitizers
12b, 196 MHzX Y X Y
Wideband Transient Buffer
Narrowband Transient Buffer
TBW
57 msec recording at 2x12 b/sample.
1000:1 duty cycle.
TBN
Continuous readout at 2x12 b/sample
and 100 kHz bandwidth.
Long Wavelength Array
. 9URSI 2010
Per-Beam Processing (Digital Receivers)
• Per-beam processing uses same board as per-antenna processing with different code.• Each board processes two beams. Each beam includes 4 downconverters, 2 per polarization.• Board input rate: 31.2 Gb/s of beamformed data for four beams.• DP2 Boards outputs DRX data for 4 beams, 2 pol, 2 tunings. • Only two of five FPGA’s used. Rest available for future expansion.
Complex
Multiply
CIC
Filter
FIR
Filter
CIC
Filter
FIR
Filter
~NCO- Tuning 1
Sin Cos
Filter Bank
I Sum
Q Sum
Low Pass Filter & Decimation
Bandwidth= 0.4 MHz to 19.6
MHz
4096 Sub-
Bands
I
Q Q
I
Complex
Multiply
CIC
Filter
FIR
Filter
CIC
Filter
FIR
Filter
~NCO – Tuning 2
Sin Cos
Filter Bank
Low Pass Filter & Decimation
Bandwidth= 0.4 MHz to 19.6
MHz
4096 Sub-
Bands
I
Q Q
I
To Data
Recorders
(10GbE)
Beam Input
(X or Y
Pol)
Long Wavelength Array
. 10URSI 2010
Processing Board Overview
• The Processing Board is the main digital signal processing hardware for the
Long Wavelength Array project.
• Uses:– As a Digital Beamformer: For each of 10 antennas (20 channels), combines sample
streams for two polarizations into 4 independently-steerable beams.
– As a Digital Receiver: For each beam, two independent 19.2 MHz bands are selected and
channelized into 4096 contiguous channels.
• Form Factor:– 20 Layer 322 by 280 mm board for ATCA chassis. Chassis holds 14 boards.
– Digitizer Board implemented as rear transition module connecting to Processing Boards
that are used for per-antenna processing. Provides separation between analog and
digital circuits.
• Board Statistics and Parts:– Uses five XC5VSX50T FPGAs.
– Uses one PPC440EPx embedded processor.
– Uses ten 512 Mbit DDR2 DRAM for Wideband Transient Buffers.
• Inputs/Outputs:– 20 ADC Inputs: Each ADC is provides 12 bits at 196 MHz, sent over six differential pairs
DDR. Sampling clock signal accompanies ADC data.
– Processor Interface IO: Two Gigabit Ethernet Ports, 1 RS232 port. JTAG port available
for debugging and PROM programming.
– Four 10GbE ports using CX4 connectors.
– ATCA Chassis Zone2 backplane: One Xaui (4 Rocket IO) input/output connection to
every other board in the chassis.
Long Wavelength Array
. 11URSI 2010
Processing Board: FPGA Centric View
SX50T
(1)
SX50T
(3)
SX50T
(5)
SX50T
(4)SX50T
(2)
beam1
beam2
beam4
beam3
Rocket IO
Diff Pairs
32
32
32
32
To Front
Panel
Connector
To Front
Panel
Connector
To
Backplane
Full Mesh
Fabric
Via
CrossBar
24
4 A/D inputs
DDR
4 A/D inputs
DDR
4 A/D inputs
DDR
4 A/D inputs
DDR
4 A/D inputs
DDR
2424 24
24
Daisy
Chain
To SX50T
(1)32
32
Beam
Daisy
Chain
(in sets of 4)4
4
4
4
4
4
4
4
4
44
addr 21
cntrl 7
Program 4
DRAM 66
diff pairs 320
clks 8
Total pins 458
Used pins 480
unused pins 22
DRAM
4
44
4
44
DRAMDRAMDRAM DRAM
Differential Pair based daisy chains use 64 (4*16) pairs in and 64 out and run at 532 MHz (133*4)
Each Beam has 8512 Mbits/sec. Each Pair also needs 1 sync diff pair bit.
Each SX50T inputs 2 stands or 4 A/D inputs. Assuming each A/D has 12 bits, this would take
48 differential pairs. By double clock, the number of pairs is reduced to 24.
Total of 128+24+8 = 160 diff pairs per chip. Still need 3 differential clocks (192 MHz and 156.25
MHz and 133 MHz). Also, need PPC EBC interface (about 40 single ended pins).
PPC440Epx
Long Wavelength Array
. 12URSI 2010
Processing Board Photo
Long Wavelength Array
. 13URSI 2010
Digitizing Board
• Digitizing Board has 20 ADC chips (AD9230BCPZ-210).
– 12 bit samples [ENOB of 10.4 @ fIN up to 70 MHz @ 250 MSPS (−1.0 dBFS)]
– Sampled at 196 MHz
– 700 MHz analog input bandwidth
– SNR = 64.9 dBFS @ fIN up to 70 MHz @ 250 MSPS
• Implemented on ATCA chassis rear-transition module board.
• Analog functionality separated from digital processing through ATCA zone 3 connector.
• 16 layer PCB with ground planes between signal layers to support impedance control
• Input signals received differentially over CAT-7 cable on RJ-45 connectors. Each cable handles 4 channels.
• One additional RJ-45 connector carries 196 MHz and 1PPS clocks.
• ADC chips configured by PPC processor through SPI bus on Zone 3 connector.
Long Wavelength Array
. 14URSI 2010
Software Development
• Monitor & Control Software
– Top level Monitor & Control Software (MCS) developed at Virginia Tech. Digital Processing Subsystem Control Computer software developed at JPL.
– Top level MCS Software sends configuration commands to dedicated the Digital Subsystem Control Computer and receives status messages.
– The Digital Subsystem Control Computer talks also interfaces to all 28 digital Processing boards via their embedded PPC440EPx processors.
– The MCS computer and Digital Subsystem Control Computer are Linux based machines.
– JPL Digital Subsystem software will initially support TBW (wideband transient capture) command. The TBW command will specify a trigger time, at which the TBWs will begin acquiring data at 2x12 bits per sample until the 128 MB RAM is full.
– Full beamforming software functionality developed later in conjunction with FPGA firmware.
• Embedded Processor Software
– Processing Board runs Debian/GNU Linux 5.0 (lenny) kernel and filesystem with PPC440EPx processor with Gigabit Ethernet network connectivity.
– Xilinx FPGA’s communicate with PPC440EPx as memory mapped devices through processor embedded bus controller (EBC) interface.
– Xilinx FPGA’s programmed through EBC interface and GPIO lines.
– Custom Linux drivers allow user mode access of EBC and GPIO lines for programming of FPGA and memory mapped data transfers.
– Simple Linux command line programs allow Xilinx programming and data access : cfgxil, xrl, xwl.
– The two test pattern generators (TEST_DRAM_IN, TEST_ADC_IN) used in combination with the TBW command for initial DP Board testing.
Long Wavelength Array
. 15URSI 2010
FPGA Firmware Development
• First Version FPGA firmware developed- Provides interface to processor, DRAM, ADC chips, FPGA to FPGA diff pairs and high
speed serial line test connectivity
- Provides wide-band transient buffer functionality
- Future Beamforming functionality shown in grey
XC5VSX50T
ADC DATA & CLK ADC
Interface
TEST_ADC_IN
1PPS
Timer
1PPS
TBW
Control
SDRAM
Interface
PPC Interface
System
Monitor
MGT
Interface
CLK_156.25MHz
CLK_196MHz
TBN
Control
SYS_CLKBFU
MGT
Control
PDB
Interface
SDRAM Control & Data
Beam Rear
Beam Front
PPC Control & Data
Developed and
verified in silicon
Developed and
verified in simulation
To be developed
Clock
Generator
DP2 CLOCK
Long Wavelength Array
. 16URSI 2010
Current Status of Digital Processing Subsystem
• Prototype Processing Board fabricated – in test and verification stage– Board power infrastructure verified
– Embedded processor JTAG, memory and boot flash verified
– FPGA programming thru JTAG & processor interface verified
– FPGA to DRAM (TBW) connectivity verified.
– Gigabit Ethernet connectivity verified
– Linux Debian operating system running.
• Prototype Digitizing Board fabricated – First samples captured– Board power infrastructure verified
– Communication with ADC chips configuration bus verified.
– Samples captured and successfully passed to Processing board over Zone 3 connector.
• Initial FPGA firmware developed– Provides interface to processor, DRAM, ADC chips, FPGA to FPGA diff pairs and high
speed serial line test connectivity.
– EBC and DRAM functionality verified in hardware.
– Provides wide-band transient buffer (TBW) functionality
• Embedded processor software– Uboot and Embedded Linux from similar PowerPC based platforms (ROACH, Sequoia)
modified for this platform.
– Drivers for user mode access of EBC, GPIO and I2C interfaces developed.
Long Wavelength Array
. 17URSI 2010
Backup slides follow
Long Wavelength Array
. 18URSI 2010
DP MCS and DP Network Switch Selection
• DP MCS Computer
– Dell PowerEdge 2970 2U rackmount server with
– Quad Core AMD Opteron™ 2372HE 2.1GHz 4x512K Cache
– 4GB DDR2, 4x1GB Single Ranked DIMMs
– 500GB SATA Hard Drive
– Intel PRO 1000PT 1GbE Dual Port NIC
– Rack Chassis w/Sliding Rapid/Versa Rails
• DP Network Switch
– Fujitsu (FUJ92MH) 48 Port 10/100/1000BASE-T Switch
– Fujitsu (SJ10GCX4A) Dual Port 10 GbE CX4 Uplink Card
Long Wavelength Array
. 19URSI 2010
Inter-Module Connections
• Connections from each DP1 to the next and from last DP1 to DP2 are needed only for the beam partial sums.
• 8 partial sums (4 beams, 2 polarizations) pass through 26 modules, thus requiring 204 inter-module signals.
• Allowing for maximum bit growth due to accumulation of 256 antenna inputs, each complete beam sum signal consists of 20b samples for two polarizations at 196 MSa/sec = 7480 Mbps.
• Might want to only allow for beam bit growth to 16 bits.
• Xilinx RocketIO serial links on Virtex-5 FPGAs provide an efficient interconnection mechanism.
• Each beam, with two polarizations, will require four RocketIO serial links.
• All intra-chassis beam partial sums will be connected without cables through the ATCA full mesh backplane.
• Inter-chassis connections will be accomplished using CX4 cables (10GE style), 1 per beam.
Long Wavelength Array
. 20URSI 2010
Using ATCA backplane for DP Beam Daisy Chain