Upload
buidieu
View
243
Download
0
Embed Size (px)
Citation preview
Optimised redundancy for Security Gateway deployments
2 Copyright © 2011 Juniper Networks, Inc. www.juniper.net
Customer Priorities
Juniper LTE Solution
RECAP:- JUNIPER LTE SECURITY OFFERING
SRX Security
Service Gateways
• 120G FW
• 30G IPS
• 10M Sessions
• 350k SPS
• 21M pps (64B)
• TL 9000 certification
• In Service SW Upgrades
• NEBS III / DC Power
• CC EAL
• Hot Swap I/O Cards
• ICSA
RAN and UE protection
Secure business and access to
all services from any to any
Mission critical availability
Voice over LTE
SCTP protection
Scalability
Core elements protection
Coordinated protection
• IP & GTP & SCTP Firewall
• QoS
• DoS Protection
• IPv6
• IPSec
• High Availability
3 Copyright © 2011 Juniper Networks, Inc. www.juniper.net
RESILIENCY CONSIDERATIONS FOR LTE/SEGW
eUTRAN
S-GW
MME
Evolved
Packet
Core
Cell sites
Services/Internet Security
Gateway
Catastrophic Act of Nature/Criminality/Terrorism
Geographic site distribution
Highly available Security Gateway
Clustered mode with IPSec tunnel and S1-U/S1-MME session synchronisation
Redundant everything
Inter-node cluster links, power feeds and PSUs, physical SeGWs
Fast failover for latency-sensitive services like VoLTE
Provide lowest possible failover times, under 0.5s
Maintain signalling
Ensure SeGW does not cause problems with common signalling failover times (800ms)
Node maintenance
Firmware and hardware upgrades with near-zero downtime
4 Copyright © 2011 Juniper Networks, Inc. www.juniper.net
Aggregation site 2
Aggregation site 1
ANATOMY OF A REDUNDANT SOLUTION
BACKHAUL
S-GW P-GW
MME
Cell-site
Core Site
Geographic distribution
Requires inter-site L2
connectivity
No hard distance limitations
Latency between sites must be
less than 100ms
Redundant HA links
Dual links for control and data
plane HA
Separate physical paths for
best redundancy
SRX5800
Rear
2+2 Redundant power
Dual power supplies on dual
zones per site
Resilient against loss of 1
entire feed or 2 PSUs
High Availability
Synchronisation of IPSec SAs
for rapid failover
Failover time commonly ~1s
L3 redundancy
BFD used to provide link
failover at L3 in ~300ms
Mitigates loss of adjacent
routers or links
Active/Active VPN
Split SCTP signalling for dual-
homed nodes
SCTP handles subsecond
signalling failover
5 Copyright © 2011 Juniper Networks, Inc. www.juniper.net
GEOGRAPHIC CLUSTER DISTRIBUTION
Site A Site B
L2
Infrastructure
Cluster
Jurisdiction
HA Links
Mitigate catastrophic event by distributing SeGW cluster members between
physical sites with L2 connectivity (required)
No hard maximum distance
Latency between sites should be less than 100ms
HA connections can be directly cabled or over a switched infrastructure
Appnote enclosed explains design guidelines
6 Copyright © 2011 Juniper Networks, Inc. www.juniper.net
MULTIPLE HA LINKS
Dual links can be used for control and forwarding plane (Fabric) HA
Maximum availability of cluster links across distributed sites
Requires additional Routing Engine (RE) per node for dual control links
2 I/O ports per node required for dual Fabric links (1Gbps or 10Gbps)
Should be cabled over separate physical paths/infrastructures for greatest
resilience
Node 0 Control
plane
Node 0 Dataplane
Node 1 Control
plane
Node 1 Dataplane
SRX Node 0 SRX Node 1
Separate physical paths
between sites
7 Copyright © 2011 Juniper Networks, Inc. www.juniper.net
REDUNDANT POWER OPTIONS
Fully redundant, 2+2 power (DC or high-capacity AC) available
Dual zones on SRX (as above)
Dual power feeds in aggregation site should be distributed across zones
Eg, Feed 1 goes to PEM 0 and PEM 1, Feed 2 to PEM 2 and PEM 3
SRX can continue to fully function through loss of
Entire single power feed
Up to 2 PSUs, providing they are different zones
Power
feed 2
Power
feed 1
8 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
HIGH AVAILABILITY:- CORE FUNCTIONALITY
JUNOS HA provides a number of core resilience functions on SeGW
Synchronisation of IPSec SAs
No tunnel re-establishment = minimal downtime for SeGW failover
Synchronisation of underlying clear-text sessions – SCTP and GTP
Allows for stateful security and HA for SCTP signalling
ISSU (In-Service Software Upgrades)*
Upgrade JUNOS with minimal downtime (potentially subsecond)
SPC capacity upgrade
Scale performance with minimal downtime (potentially subsecond)
IPSec tunnels
IPSec SA and
session sync
*IPSec support for ISSU coming 2H2012
9 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
OPTIMISED L3 FAILOVER
EPC
RAN
OSPF/BFD adjacency
EPC
L3 forwarding interface (Reth)
Site A Site B
Use 2 x L3 links up and down stream
for optimised failover
BFD (+DRP) runs between SRX and
adjacent aggregation/PE routers
Loss of aggregation/PE router or a
link causes L3 route failover
HA failover occurs only if both L3
interfaces (up or down stream) on a
node are down
Failover with BFD occurs with an
absolute downtime of ~350ms
Ideal for high priority traffic
requirements, eg VoLTE
10 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
OPTIMISED L3 FAILOVER – IPSEC TERMINATION
L3 ingress IP changes as interface
fails over
Needs an agnostic logical interface for
IPSec termination
‘Loopback Reth’
A physical interface is kept up with a
local loop cable
Used as the outgoing interface for
IKE negotiation – but no traffic
traverses the looped cable
Can be 1Gbps or 10Gbps – no
forwarding needed
Can be migrated to logical loopback
from JUNOS 12.3 (loopback currently
not supported for IPSec termination in
cluster mode)
IKE/IPSec
termination point
Loopback
cable
SRX
Aggregation
router (site A)
Aggregation
router (site B)
NB Logical view only,
SRX cluster not shown
Possible IPSec
tunnel paths
L3 interfaces
11 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
SIGNALLING OPTIMISATION
MME
eNB The problem:-
• SCTP signalling applications typically
failover in 800ms or less
• For dual-homed signalling, primary AND
secondary paths could both fail in 1.6s
• Under certain conditions, SeGW HA failover
takes > 1.6s
• HA failover could lead to complete loss of
signalling
The solution:-
• Split the primary and secondary SCTP
sessions, both from a RAN path perspective
and also an SeGW termination point
perspective
•Use Active/Active HA and divide the homing
across cluster members
Association setup
(INIT exchange) +
primary SCTP path
Secondary SCTP
path
12 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
SIGNALLING RESILIENCE WITH ACTIVE/ACTIVE HA
S-GW MME
MS
eNB
RAN
User plane
Primary
SCTP
Secondary
SCTP VPN B VPN A
SCTP dual-homed association
split down dual IPSec tunnels
In case of loss of primary path or
primary SeGW, signalling fails to
secondary VPN
Secondary VPN always up
Signalling timers (~800ms) are
catered for
User plane is not rerouted to
secondary VPN
Assumes failover time (1-3s) is
acceptable for user plane
13 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
SIGNALLING RESILIENCE WITH ACTIVE/ACTIVE HA – FAILOVER WALKTHROUGH
S-GW MME
MS
eNB
RAN
User plane
Primary
SCTP
Secondary
SCTP VPN B VPN A
1
1 Normal operating conditions – User plane
and primary SCTP through RG1, secondary
SCTP through RG2
2 RG1 failure (eg SRX loses power). User
plane forwarding and primary SCTP path
lost
3 RG1 begins to failover; SCTP detects path
down and uses secondary path
4 Failover completes, RG1 and RG2 active
on same node. User plane traffic resumes
5 Primary signalling path recovered through
SCTP heartbeats. HA preemption can be
optionally configured to failback
2
3
4
5
14 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
A/A ADDITIONAL BENEFIT:- SCTP ALG
MME
RAN
Primary
SCTP
Secondary
SCTP IP B IP A
SCTP Association is
synchronised across cluster
Possible sessions for a given
association are clearly defined
by src/dst IP addresses in the
INIT exchange
Turning on SCTP ALG allows
SCTP to be handled statefully
Prevents any potential attacks
listed in RFC5062, eg hijacking,
bombing
IP C
IP D
Init
exchange SCTP Association
SIP=A,B
DIP=C,D
15 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
USER PLANE FAILOVER WITH DUAL TUNNEL
User plane failover requires a
mechanism to detect that the
tunnel is down (or not passing
traffic due to a problem in the
path)
This could be DPD
Tends to have long timers which
do not facilitate rapid failover
30s+ common for DPD to detect
tunnel down
Checks tunnel liveness only via
IKE (does not extend to
forwarding plane checking)
Could also be a DRP
Not necessarily supported on
eNBs
16 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
FUTURE FOR TUNNEL FAILOVER – BFDoIPSEC?
BFD could offer a solution
Could be run in conjunction
with static routes
Granular timing options for
BFD keepalives
50ms is typical minimum
Can give high speed
failover between tunnels
including user plane
Currently supported over
IPSec on SRX
Not supported on all (any?)
base stations today, but
planned*
*caveat:- Juniper is not a basestation vendor, this is what we have heard!
17 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
GEOGRAPHIC MIGRATION OF SEGW
SeGW deployments tending
towards a large scale
centralised deployment
A more distributed architecture
has advantages
More efficient X2 transport
Minimal impact of SeGW
node failure
Lower performance
requirements per node
Loopback termination of
IPSec VPNs could offer a
simple migration path in
conjunction with A/A
Dual tunnels could exist on
different clusters during
migration
MME S-GW
One VPN migrated;
traffic failed over; 2nd
VPN migrated
18 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
SEGW:- REDUNDANCY SUMMARY MATRIX
Requirement Solution component Notes
Redundant power 2+2 PSUs Dual feeds per site required
Redundant HA links Dual control/Dual data plane HA
links
Links pairs should traverse
disparate paths
High Availability SRX cluster Provides IPSec SA and session
synchronisation
Fast failover at L3 Dual L3 links with BFD Mitigates loss of adjacent routers
or links
Signalling failover Active/Active Dual tunnel Design may not be supported by
all radio vendors
Geographic redundancy Dispersed cluster L2 needed between sites
19 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
REBOOT
Create Temporary IPSec Tunnel
eNodeB SGW
DHCP (can be coresident on SRX)
PKI - BE PKI – FE 1
DHCP: eN- IP@ & operator specific CA-IP@ / SEG-IP@ / NetAct-IP@
Authenticate to Operator’s CA with eNB vendor Certificate & key signing request
Create, sign & download operator’s eNB Certificate
Create Permanent IPSec Tunnel
Init
ial
Tu
nn
el
Pe
rma
ne
nt
Tu
nn
el
RE
LA
Y
Conf Server
PKI – FE 2
PROVISIONING – AUTO CONFIGURATION PROTOCOL WORKFLOW
20 Copyright © 2009 Juniper Networks, Inc. www.juniper.net
JUNIPER SRX AS SEGW:- INVESTMENT PROTECTION AND FUTURE SCALE
•Up to 8x jump in scale
• Headroom for future growth
•2x-3x boost in performance
•Redundant components
•Stateful HA
•In-service SW upgrade
•In-service HW upgrade
•Backward compatible - Low upgrade cost
•Operational Simplicity – No change to security config
Investment Protection
Non-stop services
Scale Performance
Hardware Refresh:- Key points
Next-generation SPC