Network Move & Upgrade 2008/2009: October 2008

Network Move & Upgrade 2008/2009:October 2008

Les CottrellSLAC for SCCS core services network group (Antonio

Ceseracciu, Jared Greeno,Yee Ting Li, Gary Buhrmaster), Presented at the

OU Admin Group Meeting October 16, 2008

www.slac.stanford.edu/grp/scs/net/racks/netmove-oct08.ppt

Why move• ~ 70 Building Switches connected to old core switch

that has to “move” to seismically retrofitted area• While at it, replace old, beyond end of life, limited

capability switches to provide better service

Move Types• Already done: Kavli, MCC, LCLS, SSRL (70 switches, 17 need replacing, will probably

need to re-address later, SSRL decision)

• Migrate: Switch beyond end of life, features missing (auto negotiation, higher speeds) = replace switch, connect to new core, re-address hosts– (CGB1), TL1, (WHS), CLA1, CLA2, 280, CL1..2, B267, CGB3

• Move - 1: Switch OK = use same switch but connect to temporary core switch, readdress later (after April 15th 2009)– B214, B031, B210, B005, B275, B279, CLR113, CLR224, CLR343, HFB1, HFB2,

MCC-CORE1..2, MCC- WAPCORE1..2, ROB, Research Yard: SWH-RY, B062, B104A, B113, B121, B124, B128, B211, B225, B231, B420

• Move – 2: Switch beyond end of life etc., but not central responsibility to upgrade = connect to temporary core switch– Guest House has 2, PEP ring has 4 but ring de-commissioned at moment

• Move – 3: Switch shares trunk cable, requires long (2 days) workday outage, or overtime (cost depends on what cables have to move etc., estimating costs probably $5K (2 technicians for 2 days)– Guest House 1 &2, ESA, CRYO, IR12, CGB2 (1 day).

– Will send an email to OU Admins with head’s up so can contact and warn users, get account if need non-working hours and schedule.

Long Outage Switches• Contact users, group leaders to see if can take outage

in normal work hours or get an account for overtime (could be $5K), schedule outage– ESA (21): Tyler Adams (11), Nicholas Arias (2), Rafael Gomez

(5), Zen Szalata (3)– Cryo (7 hosts): Agustin Burgos (5), Tom Galeto (2)– IR12 (4 hosts): Tala Cadorna (1), Raymond Lo (3)

• www.slac.stanford.edu/grp/scs/net/racks/slaconly/switches/ gives details of hosts on switches

Experience with Moves• Moves are easy:

– Each building switch has two (for redundancy) fibre pairs to two old core routers on to B050 floor 2

– Prepare port in 2 temporary (probably ~ 1 year) switches in seismically retrofitted area

– Identify pairs and prepare jumpers– Move backup pair to backup temporary switch– Move primary pair to primary temporary switch.– Two ~ 5 second outages, users unlikely to notice.

• No need for detailed coordination with OU admins, users, can do whenever we get to it etc.

• Could publish a schedule in future to all OU admins, but will require more effort, scheduling, easier to notify when done, or 5 mins before do it

Migrations• Require re-addressing & close coordination• ID Admins (can be many) & switch ports etc. create web page

documenting what has to be done, addresses, set up tracking tickets etc.

• Email to admins request them to validate CANDO info and read web page:– Three types of hosts: printers, SLAC only, open access to world.

• Meet with admins, explain, schedule time• Install replacement switch when appropriate, configure• With each admin, a network tech and a network engineer move

cables one by one from old switch to replacement, re-address host, check things work etc.

• During or shortly after migration, network engineer will update CANDO with new IP address.

• To date, have been migrating all of one OU Admin’s machines at a time.

Migration Experience• Two switches almost done (CGB1, TL1), elapsed > 1 week

• Difficult, labor intensive, requires lots of coordination, availability, impacts users

• Problems with devices not being in the documented place, patch panel labeling being wrong, patch cables not being long enough

• Be wary of old, non-standard devices

• Devices that have been turned off do not show up on our spreadsheets

• Takes time to get print queues changed on Windows, but can be requested in advance

• Will be setting a hard deadline depending on # devices etc.

Lessons learned• New networks require different subnet mask and default

gateway; make sure this is clear.• Make sure all devices have an IP assigned in advance to

reduce confusion.• Confirm which devices should be SLAC Only (IFZ) vs

Public in advance.• When replacing the switch, can take up to 15 minutes per

device (walk to machine, log in, change IP, request cable change, test), so be prepared and patient.

• Use ipconfig /registerdns on Windows computers to make sure Windows DNS gets updated, then test and inform windows-admin if IP is still wrong.

• Still working on developing automation to change Windows system IPs.

Progress Temporary switchesCORE3OLD in seismically retrofitted area – Sep 08Need reconfig and connect up & CORE4OLD

CORE4OLD in seismically retrofitted area – Oct 08CORE4OLD in place too

• See “Seismic Retrofitting Rack Move 2008” site– https://confluence.slac.stanford.edu/display/NetMan/Seismic+Retrofitting+Rack

+Move+2008

– Contains background information, overview of

procedures, milestones, drill down to lots more details (tickets, spreadsheets, subnet allocations, hosts on individual switches etc.)

– This is where to go to get detailed information. It is very dynamic.

– If you need more, let us know we will add as appropriate• Email to core-neteng• There is an FAQ at

https://confluence.slac.stanford.edu/display/NetMan/Frequently+Asked+Questions

Documentation

New Area• New area circa Aug

21 ‘08

• New area circa Sep 18 ‘08

• New area circa Oct 15 ‘08

Central Routers• SWH-CORE1&2-OLD in old

racksCORE3OLD in seismically retrofitted area – Sep 08Need reconfig and connect up & CORE4OLD

Documents

Network Move & Upgrade 2008/2009: October 2008