21
12-Jun-2000 1 NSI Registry Engineering & Operations Update Ari Balogh VP of Engineering [email protected]

NSI Registry Engineering & Operations Update

Embed Size (px)

DESCRIPTION

NSI Registry Engineering & Operations Update. Ari Balogh VP of Engineering [email protected]. High-Level Architecture. Registrar Growth. Average Daily Transactions. Qtr to 5/31. In millions, compared to Original Plan and New Projections (peak of 27.5M). Total Transactions Summary. - PowerPoint PPT Presentation

Citation preview

12-Jun-20001

NSI Registry Engineering& Operations Update

Ari BaloghVP of Engineering

[email protected]

12-Jun-20002

High-Level Architecture

RegistrationSystem CSRs

Root, gTLD ,Node

RR PProxy

Regr.Tools

App.Server

Dom ains, NSs,Registrars

W hoisServer

Network SolutionsRegistry

W hois

DN SZones

Regr.Reprts.

CSRTools

InternetUsers

CSRsF irew all

Registrars

H TTPR R P/SS L H TTPR R P/SS L

12-Jun-20003

Welcome letter sent to Registrar candidates 31

Registrars in pre-production 48

Registrars in production 44

Total number of Registrars in Registry 123

I CANN accredited Registrars 123

Total Number of Names in the Registry Database

Registrar Growth

12-Jun-20004

0.0

5.0

10.0

15.0

20.0

25.0

Plan of Record 1.2 1.4 1.6 1.9 2.2

1/ 1/ 00 Projection 2.8 4.5 5.2 6.0 7.0

Actual 2.8 5.6 19.4

4Q99 1Q00 2Q00 3Q00 4Q00

Average Daily Transactions

Qtr to 5/31

In millions, compared to Original Plan and New Projections (peak of 27.5M)

12-Jun-20005

0.0

100.0

200.0

300.0

400.0

500.0

600.0

700.0

Write 1.6 4.5 5.0 4.0 4.5 4.5 8.1 8.4

Query 0.4 3.3 4.4 2.6 3.8 4.2 15.4 18.0

Check 29.2 38.6 77.7 113.7 151.6 212.2 518.6 616.9

Oct Nov Dec J an Feb March April May

Total Transactions Summary

In millions

38%49% 88%

33% 38%

145%

19%

12-Jun-20006

Availability & Performance

• Service Level Agreement (SLA) allowances:– 8 hours total outage per month, 4 hours unplanned– 3 seconds average for check domain (excluding worst

5%)– 5 seconds average for add domain (excluding worst 5%)

• January observed performance:– 3.5 hours planned outage to implement governance

issues, no unplanned– 600 ms per check domain, 2.5 seconds per add

• February observed performance– No planned or unplanned outages– 700 ms per check domain, 2.6 seconds per add

12-Jun-20007

Availability & Performance

• March observed performance– Two 2 hour planned outages, 1.25 hour unplanned

outage– 60 ms per check domain, 300 ms per add

• April observed performance– 2.5 hours planned outage, no unplanned– 78.5 ms per check domain, 319.5 ms per add

• May observed performance– 2 hours planned outage, no unplanned– 34.7 ms per check domain, 257.2 ms per add

12-Jun-20008

A Root Performance - UDP Packets/Second

5 Minute Average

30 Minute Average

12-Jun-20009

A Root Performance - Drops & Overflows

Drops - 5 Minute Average

Overflows - 5 Minute Average

12-Jun-200010

J gTLD Performance - UDP Packets/Second

5 Minute Average

30 Minute Average

12-Jun-200011

M gTLD Performance - UDP Packets/Second

5 Minute Average

30 Minute Average

12-Jun-200012

The Infrastructure Problem

• SLA that incurs $500K/day outage and performance penalties

• Single shared database experiencing 30% - 90% per month OLTP growth– Heavyweight stored procedures– Sustained 50%-70% utilization with peaks to 100% … and no

more easy software fixes– Increasing extract duration for zones, Whois, registrar extracts,

5 - 14 hours

• Immature or end-of-life HA options for E4500• Sun, Veritas, EMC version and support issues

12-Jun-200013

DB Server Evaluation

• Evaluated top Unix machines– Sun E10000, HP V2500, IBM S7A/S80

• Narrowed to E10000 and S7A/S80• Conducted three month live test of S7A/S80

– Ported gateway and application servers to IBM Java environment

– Created RRP path configuration– Demonstrated performance and availability (HA/CMP)

• Investigated impacts of E10K– Different administrative model– EMC integration issues

12-Jun-200014

Definitive Results

• Excellent Java and C code portability• S80 clear performance leader, benchmarks and real-world– Approximately 3 times the throughput per CPU vs. E10K– Noticeably improved Java performance (!)

• Robust HA implementation• Complete 64-bit environment• Native file system and volume management;

excellent EMC integration• Impressive and thorough support

– Demonstrated appreciation for multi-vendor, mission critical computing

12-Jun-200015

Scaling DNS

• Domain name resolutions on A Root– 4Q99 - 220M per day– 1Q00 - 430M per day– 2Q00 - 650M per day– 4Q00 - 1.5B per day, more?

• Need 64-bit machines to scale past 4GB/23M domain name wall

• Developing bind extensions for high performance gTLD

12-Jun-200016

64-bit DNS Evaluation

• Engaged Unix vendors to aid with in-house evaluation of 64-bit mid-range Unix servers– HP N4000, IBM H70, Sun E3500

• E3500 eliminated early -- scale and 64-bit issues

• H70 within 15% of N4000, upcoming upgrade substantially faster

• Chose M80 as new root/gTLD platform• Using E4500s as alternate platform and

placeholder for UltraSparcIII generation

12-Jun-200017

0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

The Dot Problem

Resolutions per day. A Root meltdown?

12-Jun-200018

Dot Diagnosis and Fix

• Too much load for existing E450• Qualified and put into production the

[evaluation] H70– Greater than 60% increased throughput– Jump from 220M resolutions per day to over 400M

• Qualified and put into production an S80 as placeholder for upcoming M80 deployment– Greater than factor of three improvement over previous E450

• Tweaked TCP keepalive defaults and bind select loop

• Filtered dynamic updates

12-Jun-200019

The New Dot

0

50,000,000

100,000,000

150,000,000

200,000,000

250,000,000

300,000,000

350,000,000

400,000,000

450,000,000

500,000,000

A Root resolutions per day with H70

12-Jun-200020

Packet Drops

Percent packets dropped, day of H70 deployment

Deployed 11 a.m.

“Current” time(9 a.m. day after)

12-Jun-200021

Upcoming access -www.dnsentral.net