Upload
esaesoc-darmstadt-germany
View
877
Download
8
Tags:
Embed Size (px)
DESCRIPTION
This seminar provides an overview on the migration of all the ESA missions controlled by ESOC from X.25 to TCP/IP and from proprietary protocols between Mission Control System and Ground Station to the CCSDS Space Link Extension protocol.
Citation preview
OPS-G Forum 15 June 2007
OPS-G FORUM
15 June 2007
Migration of ESA Missionsto TCP/IP and SLE
Presented by: M.Bertelsmeier OPS-ECT E.M.Soerensen OPS-ONV G.Kerr TOS-GDA R.P.Bonilla OPS-OAX
OPS-G Forum 15 June 2007
2
Communications Network Infrastructure (M. Bertelsmeier)
Overall mission overview (E. Soerensen)
XMM Challenge (G. Kerr)
XMM case; problems encountered and solutions (R. Pérez Bonilla)
Lessons learned
Agenda
OPS-G Forum 15. June 2007 - Page 3
ESOC OPS-ECT
Migration of ESA Ground Station Networkingto Internet Protocol
Presented by M. Bertelsmeier, OPS-ECT
OPS-G Forum 15. June 2007 - Page 4
ESOC OPS-ECT
Migration Drivers
• ESTRACK /OPSNET strategies: single protocol, use of CotS• IP world-wide de-facto standard • IP support integral part of CotS TTC building blocks (vs. X.25
as extra / exception with unknown future)• Control Center internal support via LAN, TCP/IP• MOC, SOC, SSC, SDC links support via routers, TCP/IP• IP standardised for SLE support• Current packet switched WAN nearing crossroads to complete
overhaul• Future of X.25 parts and support
OPS-G Forum 15. June 2007 - Page 5
ESOC OPS-ECT
Migration Strategy
Boundary conditions at start of project (late 2000)• no operational impact on missions in orbit or immediately before LEOP (at the
time: ERS, XMM, Cluster, ENVISAT, Integral)• new system to support future missions (success-oriented: Rosetta, LEOP planned
for 2003)
Concept• upgrade systems so that they can support current and future mode concurrently,
subject to dynamic reconfigurations– implement “dual protocol support capability” on OPSNET and OPSNET
subscribers
Context• maximum alignment with
– New Norcia Deep Space Station implementation, – Maspalomas upgrade and – ESTRACK stations back-end modernisations
OPS-G Forum 15. June 2007 - Page 6
ESOC OPS-ECT
Phases
• Phase 1 - Preparation and Verification (start late 2000)– software adaptations, testbed, end-to-end proof of concept, testing
in Rosetta scenario, MEX scenario (high speed TM), including LAN roll-out in stations subject to back-end upgrades
• Phase 2 – Field deployment of dual capability (2001-2005)– completion of control center and stations upgrades
• Phase 3 - Mission migrations (2002 ff)– migrate operations support from X.25 to IP, adapted to mission /
station use profiles– Natural pace done at windows of opportunity
• Phase 4 – Completion (2006 and ongoing)– withdrawal of OPSNET packet switching equipment
OPS-G Forum 15. June 2007 - Page 7
ESOC OPS-ECT
Protocol and System Features
SinkSource Network
L-2Data Link
L-4Transport
L-3Network
L-1Physical
L-5,6,7X.25 TCP / IP
transmission control
packet switching network
hosts with applications
"aware"
network provided layers
1,2,3 3,4
acknowledgement and error recovery
all levels, segment by
segment
e-2-e, level 4
"heartbeat"protocol inherent, level 3
application to be
adapted
OPS-G Forum 15. June 2007 - Page 8
ESOC OPS-ECT
ESTRACK OPSNET Links Before and After Migration
ESA internal LANs
Ref. StnLAN
leased linesprime / backup ISDN
Station LAN
OPSLAN Core
Firewall Firewall
leased linesprime / backup ISDN
OCC ISS node (X25)
Station ISS node (X25)
M&C LAN
OPSLAN
Firewall
point-to-point
Extranetlinks
Internet
point-to-point
Extranetlinks
Internet
OCC NetCore LANRouter A/B
Router A/B
ESA internal LANs
Sim LAN
Firewall
Ref. StnLAN
Router
TMP TCE RNGSTC
serverRouter
STCclient
NCTRSA/B
MCSA/B
TM TC [RNG]STC
Server
STCclient
MCSA/B
SimA/B
NCTRSA/B
Sim LANSimD
SimA/B/D
Before
RoutersRouters
requires change
After (Target)
ESTRACK Security Perimeter
OPS-G Forum 15. June 2007 - Page 9
ESOC OPS-ECT
Topologies During Migration
Aim: • no additional line rental cost for support to two protocols
Topologies• “Overlay”: for WAN links of poor capacity/price ratio (e.g.
Kourou, Santiago, Maspalomas, Malindi) IP-OPSNET as frame relay overlay over X-OPSNET
• “Side-by-side”: for WAN links of 2 Mbps (KIR, NNO, PER, ESAC) IP-OPSNET and X-OPSNET side by side, using multiplexing
interface converters between WAN line and Switches / Routers
OPS-G Forum 15. June 2007 - Page 10
ESOC OPS-ECT
Operations Scenarios
During Migration
hybrid operations
NCTRS
router router
TM TM TC
Dual support scenario Single support scenario
OPSLAN(internal routers included)
TC
ISS (X25)
TC
NCTRS
router
TM
NCTRS
ISS (X25) ISS (X25)
G/W router
WAN
Hybrid scenario: same NCTRS interacting with an X.25 station and an IP station
IP modeX25 mode IP mode
conv
conv
WAN
overlay
side-by-side
OPS-G Forum 15. June 2007 - Page 11
ESOC OPS-ECT
Requirements on IP-OPSNET
• Services• WAN: digital voice OCC - Stations (ca. 10...12 kbit/s)• WAN: data OCC - Stations (up to few hundred kbit/s)
– TM, TC, STC client/server, orbital data, GPS, auxiliary data, service management, network management
• LAN: data transit to / from OCC; all remaining data exchanges inside station, incl. M&C, UPS, BMS, FM (e.g. NNO)
• Security • Capabilities
• near “non-stop” availability --> reliability, redundancy, resilience• capacity --> performance, modularity, scalability• throughput --> performance, prioritisation, congestion management
• Environment• WAN circuits with delay and errors • (benchmark: 400 ms delay one way, BER 10-7 both ways)
OPS-G Forum 15. June 2007 - Page 12
ESOC OPS-ECT
• Communications Systems– Automatic rerouting in case of line drops and equipment failures (distributed dynamic
routing algorithm, Hot-Standby Routing Protocol (HSRP))– Throughput maximisation: tuned Frame Relay interface between Cisco routers and
Netrix nodes (during overlay phase)– Hierarchical bandwidth reservations and priorisation, better than X.25 (“Quality of
Service” system, feasible for on-line and off-line)– Provisions for Voice over IP integration
• Subscribers – Feasible UNIX system configurations under Sun Solaris 2.6 and above– Tuned TCP stacks to cope with high-delay, high BER environments
• End-to-End Connections– Stable performance for real-time telemetry at rates up to 256 Kbps with RTT of 800 ms
and BER of 10exp-7.– Delta-DOR throughput over load-sharing pair of E1 lines: 95% of wire-speed.
Implemented Features / Performances
OPS-G Forum 15. June 2007 - Page 13
ESOC OPS-ECT
Present Status
• SVA, CEB: X.25 never deployed
• KIR, MSP, KRU, RED, MAL: X.25 idle or already de-installed
• NNO, PER, VIL: X.25 equipment to be freed of voice support. (VIL scheduled next week.)
• AGO: X.25 still in use. Current leased line (128k) insufficient for XMM retransmission needs, awaiting cancelation, new leased line not planned due to AGO use predicition. Alternate link concept under discussion.
• “IP”-OPSNET is now the “OPSNET”
• OPSNET SLE-ready (except AGO)
OPS-G Forum 15. June 2007 - Page 14
ESOC OPS-ECT
Papers
EXCITE – The Migration of the ESA TTC Network to TCP/IP, TTC 2001
The Evolution of ESA Ground Station Communications to Internet Protocol, SpaceOps 2002
Network Security and SLE / IP Internetworking for Inter-Agency Cooperation,SpaceOps 2004
A Novel Approach for Ground Stations Communications within the ESTRACK Networkof ESA, DASIA 2005
A Novel and Cost Effective Communications Platform for the ESA Stations Network,RCSGSO 2005
Information Technology Solutions for Delta-DOR Large Volume Data Transfers,SpaceOps 2006
New Communications Solutions for ESA Ground Stations, ESA Bulletin February 2006
OPS-G Forum 15. June 2007 - Page 15
ESOC OPS-ONV
Migration of Missions to SLE: Overall perspective
Presented by E. M. Soerensen OPS-ONV
OPS-G Forum 15. June 2007 - Page 16
ESOC OPS-ONV
Scope of work
• Strategy only SLE will be used in the future (longer term)
• 9 Missions that needed to be migrated to SLE
• NCTRS upgraded to support SLE
• 13 stations to be upgraded, as of 2000 – In some stations TMTCS is installed and in some CORTEX is installed
and both support SLE
• A total of 28 configurations (mission/station combinations) had to be implemented and validated
OPS-G Forum 15. June 2007 - Page 17
ESOC OPS-ONV
Mapping of Missions and Stations (2004)
Vils
pa I
Vils
pa II
TS
1
Red
u
Kir
una
I
Kir
un
a iI
Ko
uro
u
Pert
h
New
No
rcia
Masp
alo
mas
Ma
lin
di
Svalb
ard
Sa
nti
ag
o
ERS-2 ENVISAT CLUSTER XMM INTEGRAL ROSETTA MEX SMART-1 OTHER
OPS-G Forum 15. June 2007 - Page 18
ESOC OPS-ONV
Challenge
• The SCOS-1 Missions (ERS-2, ENVISAT and CLUSTER) were a special challenge because they use VMS and SLE is not supported on VMS
• Solution: migrate to SUN-based NCTRS for these missions
• Successfully done – N.B. these missions were the first at ESOC to use SLE fully
OPS-G Forum 15. June 2007 - Page 19
ESOC OPS-ONV
Status Summary
Station Status
Cebreros IP-OPSNET, X25 Removed
Kiruna IP-OPSNET, X25 Removed
Kourou IP-OPSNET, X25 Removed
Maspalomas IP-OPSNET, X25 Removed
New Norcia IP-OPSNET, X25 Removed
Perth IP-OPSNET, X25 Removed
Redu IP-OPSNET, X25 Removed
Vilspa IP-OPSNET, X25 Removed
Santiago X25 still in use – plan: service contract providing SLE services
OPS-G Forum 15. June 2007 - Page 20
ESOC OPS-ONVSLE Service Providers
Goldstone, CA U.S.
Madrid, Spain
Canberra, AustraliaKourou, French Guiana
Cebreros, Spain
Villafranca, Spain
Mas Palomas,Gran Canaria Island
Redu, Belgium
Kiruna, Sweden
Svalbard, Norway
Weilheim, Germany
Malindi, Kenya
Perth, Australia
New Norcia, Australia
Operators/Networks
ESA/ESTRACK
Tromsø, Norway
Esrange, Sweden
St-Hubert, Canada
NASA/JPL/DSN
DLR
NSC/KSAT
SSC/Prioranet
CSA
CNES
Kerguelen, France
Hartebeessthoek,Republic of South Africa
Kourou, French Guiana
Aussaguel, France
Kiruna, Sweden
China
OPS-G Forum 15. June 2007 - Page 21
ESOC OPS-ONVSLE Service Users
NASA/JPL, Pasadena, CA U.S.
Lockheed Martin Denver, CO U.S.
JHU/APLLaurel, MD U.S.
NASA/GSFCGreenbelt, MD U.S.
ESA/ESOCDarmstadt, Germany
DLR/GSOC, Oberpfaffenhofen, Germany
JAXA/ISASSagamihara CityJapan
CNES,Toulouse, France
China
OPS-G Forum 15 June 2007
22
XMM Challenge
Presented by G. Kerr, TOS-GDA
OPS-G Forum 15 June 2007
23
LINK TO MAIN Ground Station KOUROU
OPS-G Forum 15 June 2007
24
XMM commands mainly in real-time (~1000’s cmds/hr)
Implicit timing constraints on commanding embedded in database
Commands sent via X.25 receive G/S confirmation in 2-3 secs from Kourou – OK
Commands sent via TCP/IP received G/S confirmation in 6-10 secs from Kourou – NOT Acceptable
INTEGRAL changed TCP/IP buffer sizes on TMP at Redu – not an option for XMM at Kourou (multi-mission)
We concentrated on NCTRS (TCP/IP negotiates between computers)
INITIAL CONSIDERATIONS
OPS-G Forum 15 June 2007
25
Underlying cause of delays not initially clear
Which TCP/IP parameters could/should be modified - how to get TCP/IP expertise?
Confusing and contradictory documentation – mainly for maximising bandwidth utilisation (we have guaranteed bandwidth)
Different TCP/IP parameter sets on TCE and NCTRS – difficult to make an equivalence
No root privileges to change anything anyway – strong opposition to changing TCP/IP on OPSLAN - understandable
PROBLEMS ENCOUNTERED (1)
OPS-G Forum 15 June 2007
26
No useful analysis tools available for us to use – sniffer/snoop output difficult to interpret – requested TCPTRACE / TCPDUMP – not allowed on OPSLAN
Initially testing on ESOC Reference Station impractical and not representative (satellite link, frame relay over part of link, router delays, etc.)
G/S operator support needed to help set up CLCW path from PSS to TCE – setup time often some hours
PROBLEMS ENCOUNTERED (2)
OPS-G Forum 15 June 2007
27
XMM caseProblems and Solutions
Presented by R.P.Bonilla, OPS-OAX
OPS-G Forum 15 June 2007
28
X25 versus TCP/IP
X25
connection oriented protocol
record based. Data is organised in blocks, and transmitted one at a time.
creates packets containing info for reliability. No packets loss, and ensures delivery in order.
no buffers.
data flow doesn’t use algorithms.
TCP/IP
Connectionless protocol
stream based. Data is organised as a stream of bytes, much like a file.
creates segments containing info for reliability. No segment loss, and ensures delivery in order.
buffers at each end point, store data to be transmitted before the other side is prepared to read data.
data flow is based on algorithms that are tuneable; manages buffers, and coordinates traffic.
OPS-G Forum 15 June 2007
29
TCP protocol stack
IP layer IP layer
Transport
Application
Sender / Receiver Receiver / Sender
Router Router
Physical link
IPv4 (re-assembly buffers)
TCE / TMP
TCP
IPv4
Application S2K NCTRSwrite ( )
Transport TCP (socket-buffer)
Segments
Output queue Receive queue
read ( )
MTU sized IP packets
GroundStationESOC
TCP ACK packets
Network Network
OPS-G Forum 15 June 2007
30
Definition of Delay used for analysis
Delay: Time that the command takes to travel from the NCTRS to the TCE Time the ‘acknowledgement message’ generated by the TCE takes to reach the NCTRS.
Sender / Receiver Receiver / Sender
Router Router
Satellite linkor
Terrestrial link
Application
write ( )
Output queue Receive queue
read ( )
MTU packets
TCP acknowledgements
XMCS XNCTRS
buffers
TCETMP
buffers
MTU packets
ESOC GroundStation
OPS-G Forum 15 June 2007
31
TCP/IP vs X25 (KRU PSS) delay with default TCP/IP
TCP vs X25 delay (Kourou PSS, prime VSAT link)Histogram
0
50
100
150
200
250
300
350
400
450
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58 61 64 67 70 73 76 79
*0.1sec
Occ
urr
ence
s
TCP
X25
CMD's X25 TCP/IP Short 1000 500 Long 1000 500
Delay Statistics Average (TCP-X25) = 3s Maximum (TCP-X25) = 4.8s Minimum (TCP-X25) = 0.95s
OPS-G Forum 15 June 2007
32
Telemetry and Commanding transfer flow (1)
Buffers
ReceiverSender
packets
Application
TCP
write buffer A
Application
Transfer window size:- Buffer B free space.- Latency
Transfer window size:- Buffer B free space.- Latency
Waiting for ACK
send buffer A
Received buffer B≥ RTT * Bandwidth
Received buffer B≥ RTT * Bandwidth
read buffer B
TCP
IP
NIC
TMP NCTRS
IP
NIC
segments
Rp
Rn
to be read
empty
OPS-G Forum 15 June 2007
33
TCP data flow (1)
Sender parameters
Congestion window = amount of data injected into the network at a particular time.
Congestion window max = determined by the link capacity (tuneable). And/or adjusted to the receiver buffer capacity.
data allowed to be sent =min [cong. window, window offered by receiver]
Timeout timer = interval waited before Retransmitting, due to ACK not received.
Buffers and Windows
Co
ng
esti
on
win
do
w (
Kb
ytes
)
Time (s)
Slow start
OPS-G Forum 15 June 2007
34
TCP algorithms behavior
XMM specific
Rp= Data rate delivered
from TMP to TCP layer
(70kbs).
Rn= data rate delivered
by TCP to the Network.
Rline= physical capacity
of the comms link.
Rate (kbs)
Slow start
Recovery Nominal operations
Packet loss Recovery…
Undesirable TCP/IP behaviour in the presence packet loss
time (s)
OPS-G Forum 15 June 2007
35
TCP data flow (2)
Timers for ACK control system performance
Receiver
tcp_deferred_acks_max (1→ 8 segments)
max TCP segments received
before forcing out an ACK.
Timeout timer initial
Sender
Data
ACK
rexmit data
rexmit data
rexmit data
reset
0.4s
Timeout timer min
Timeout timer min
Timeout timer max
0.4s
60s
3s
tcp
_def
erre
d_a
ck_i
nte
rva
l
tcp_deferred_ack_interval (0.1s)
Time interval the sender waits to
receive an ACK.
Timers not optimised for XMM latency
OPS-G Forum 15 June 2007
36
XMM problem (1)
In our test setup, commands released from NCTRS to TCP every 2 seconds, but from TCP to Network layer was much slower buffering of commands at the Sender side in order to fill the MTU size.
acknowledgements NOT released from the TCE back to the NCTRS as soon as a segment was received buffering of ACKs at Receiver side.
Not possible to achieve a reasonable ‘end-to-end delay’ (MCS S/C) (maximum 5sec.)
Undesirable behaviour :
Buffering of Commands at NCTRS (TCP level) , and of ACKs at TCE (TCP level)
TCP segment is not equal to Max Transfer Unit (MTU) and also not equal to longest length commands.
TCP transmits the data as a stream of bytes, unrelated to application coding
OPS-G Forum 15 June 2007
37
TCP interval between successive ack’sKRU TCP/IP default set-up
Interval between successive acks at TCE (observed at application level)
0
1
2
3
4
5
6
7
12:00:00 12:14:24 12:28:48 12:43:12 12:57:36 13:12:00 13:26:24
Release time
inte
rva
l b
etw
ee
n s
uc
c.
ac
ks
. a
t T
CE
Acknowledgements not synchronous with command release
OPS-G Forum 15 June 2007
38
XMM Solution (1)
Set Max Transfer Unit (MTU) ~1 command of longest length.
encapsulation of MTU size between NCTRS and TCE.
The number of segments received before forcing an ACK, was set to 1 (only on NCTRS, default value = 8). Equivalent parameter not found on TCE.
Telemetry and ACKs separated into 2 different data streams (applying independent Quality of Service for each one on GS routers).
TMP set to deliver data every 1 second, instead of the default value, 2 seconds.
OPS-G Forum 15 June 2007
39
TCP interval between successive ack’s( tuned parameters on NCTRS )
Interval between successive acks at TCE -Reference Station-observed at Application level
0
0.5
1
1.5
2
2.5
3
3.5
14:24:00 14:31:12 14:38:24 14:45:36 14:52:48 15:00:00
Release Time
Inte
rva
l b
etw
ee
n s
uc
c.
ac
ks
Statistics Short cmd's Long cmd's Average 2s 2s Max 2.75s 2.87s Min 1.31s 1.15s
Desired behaviour after tuning: ACKs synchronised with commands
OPS-G Forum 15 June 2007
40
TCP vs X25 tuned parameters on NCTRSTCPvsX25 (Reference Station set-up as KTU G/S)
Histogram
0
50
100
150
200
250
300
350
400
450
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61
*0.1sec
Oc
cu
rre
nc
es
TCP
X25
Statistics (TCP-X25) delay Average = -0.02s Maximum = -0.58s Minimum = -0.72s StDev = -0.01
Statistics TCP delay Short cmd's Long cmd's Average 1.21s 2.21s Max 2.16s 2.78s Min 0.9s 1.85s STDEV 0.25 0.24
After tuning, TCP/IP now behaves as well as X25.
OPS-G Forum 15 June 2007
41
XMM problem (2)
Unnecessary retransmissions at TCP level. Detection of packet loss causes a decrease of the Send window, so the system starts to slow down. Packet loss due to a hit on the link, or due to the intrinsic high BER of the satellite link.
Reception of Telemetry packets up to 4 minutes late at MCS, and sometimes causing loss of data (‘FIFO buffer full’) - Due to burst errors on the satellite link.
ReceiverSender
packets
Application
TCP
write buffer A Application
send buffer A
read buffer
TCP
TMPNCTRS
Rp
Rn
to be read
empty
2MB 70kbs
OPS-G Forum 15 June 2007
42
XMM problem (2) - graphical
··· Throughput
a Nominal rate
b Bit error
c Burst error
OPS-G Forum 15 June 2007
43
XMM Solution (2)
Increase guaranteed Bandwidth above theoretical required for Telemetry.
Tuning of Retransmissions timers at Sender and Receiver sides should be done.
Because of the latency of the transmission and the low speed of the link, packets
are continuously retransmitted, even without errors on the link.
Use Selective Acknowledge (SACK) at the TMP and TCE.
OPS-G Forum 15 June 2007
44
SLE
IP layer IP layer
Transport
Application
Sender / Receiver Receiver / Sender
Router Router
Physical link
IP
TCP
IP
Application
write ( )
Transport TCP
Segments
Output queue Receive queue
read ( )
MTU sized IP packets
GroundStationESOC
TCP ACK packets
Network Network
SLE
NCTRS
SLE
TMTCS
OPS-G Forum 15 June 2007
45
TCP/IP should be tuned, and tuning is a very complex exercise.
Each mission should have one person with final responsibility for ensuring appropriate comms setup, and with full root authority on all related computers, end-to-end
Ensure strengthening and maintenance of systems levels expertise of TCP/IP concepts.
XMM TCP/IP migration effort was radically underestimated.
LESSONS LEARNED (1)
OPS-G Forum 15 June 2007
46
The effort of rolling out a new system involving network infrastructure and multiple missions is considerable and was underestimated
When introducing new protocols (TCP/IP, SLE) adequate access to stations and network for operations validation on each mission is critical: must be taken into account.
The decision to take SLE as the single supported protocol for ESA or third party missions was correct.
LESSONS LEARNED (2)
OPS-G Forum 15 June 2007
47
Dual protocol capable networks and platforms were very good concept to allow migrations at windows of best opportunity. Schedule flexibility and independence from constraints like changes to mission model and ESTRACK load.
Network design as overlay or side by side on standard high economy leased lines has avoided extra cost for two networks.
TCP/IP protocol suite standards evolution occurs, not with X.25.
The communications network is just that. It can offer different classes of throughput and priority, but control of the load that the “user” system offers to the network has to reside in the “user” system.
The design of an e2e communications service has to be understood by all vertical layers involved in source-destination relations.
LESSONS LEARNED (3)