29
ANF SYSTEMS ARCHITECTURE Antelope Users Group 2012 University of Nevada, Reno, NV

ANF SYSTEMS ARCHITECTURE

  • Upload
    lythuan

  • View
    222

  • Download
    2

Embed Size (px)

Citation preview

Page 1: ANF SYSTEMS ARCHITECTURE

ANF SYSTEMS ARCHITECTURE

Antelope Users Group 2012University of Nevada, Reno, NV

Page 2: ANF SYSTEMS ARCHITECTURE

WHERE WE WERE

• Mid-2005

• Single Sun V240 with 3.5 TB storage

• SIngle Linux server for Web site• Single

Datacenter

Page 3: ANF SYSTEMS ARCHITECTURE

INFRASTRUCTURE AT MAIN DATACENTER

• 3 Dell R710 x86_64 servers,

• 192 GB RAM, 512GB mirrored root disks, 2 port Fibre Channel HBA

• Redundant Cisco Switches

• Cisco VPN Gateway, Serial Console Server for Mgmt.

• 3 Apple Xserves

• Redundant QLogic SAN Switches

• Compellent Storage System for Block Storage

• Shared data via NFS

• Legacy Sun Hardware - 3 T5220 systems, 1 T2000 (for network monitoring)

Page 4: ANF SYSTEMS ARCHITECTURE

VIRTUAL MACHINES AND ZONES

• Live on top of our physical infrastructure

• Primary OS disk and host specific data volumes on SAN storage

• Can be moved between physical servers with varying degrees of ease. VMware is automatic, Zones manual

• 15-20 CentOS Linux VMs (varies)

• ~15 Solaris Zones

Page 5: ANF SYSTEMS ARCHITECTURE

BACKUP DATACENTERS

• Main DR site at IGPP

• 2 Sun T2000 servers

• 3 Nexsan SATABeast Storage Arrays

• 1 AC&NC JetStor Array

• Redundant LAN and SAN

• Total ~160 TB backup storage

• Scorched earth site at IRIS DMC

• Single Sun T2000 with 768 GB RAID5 internal disks

Page 6: ANF SYSTEMS ARCHITECTURE

CURRENT INFRASTRUCTURE

VMWare vSphere

VMWare Node

VMWare Node

VMWare Node

CentOS 6.2 Virtual Machine

CentOS 6.2 Virtual Machine

CentOS 6.2 Virtual Machine

...

Puppet Configuration Management

Storage Area Network (Compellent)

Solaris Node

Solaris Node

Sola

ris Z

one

Sola

ris Z

one

Sola

ris Z

one

Sola

ris Z

one

Inte

rmap

per

(Net

wor

k M

onito

ring)

Page 7: ANF SYSTEMS ARCHITECTURE

KEY ARCHITECTURE HIGHLIGHTS

• Virtualization

• SAN Storage

• Redundant Network and SAN Connections

• Configuration Management

Page 8: ANF SYSTEMS ARCHITECTURE

WHY VIRTUALIZATION?

• Keep intensive processing jobs from interfering with real-time data collection

• Separation of core systems from analyst work and testing

• Easier hardware maintenance

• Easily create clones of existing systems for testing

Page 9: ANF SYSTEMS ARCHITECTURE

VIRTUALIZED ANTELOPE SYSTEMS

AcquisitionImport/Export Hub

Waveform WritesReal-time ProcessingReal-time data review

Web products

Display Kiosks

Dev/TestAnalyst

Post Review

Page 10: ANF SYSTEMS ARCHITECTURE

INTERMAPPER

Page 11: ANF SYSTEMS ARCHITECTURE

SAN STORAGE

• Compellent system at primary Datacenter• 110 TB primary storage• no more manually trying to move

data to fast disk or slow disk depending on workload

• All of our old disparate storage arrays pressed into service as offsite replication target for Disaster Recovery• DR data synced via ZFS snapshots

Page 12: ANF SYSTEMS ARCHITECTURE

WHY COMPELLENT?

• Antelope and our local extensions generate a lot of different storage workloads

• Waveform data is write biased, lots of small writes

• Once it’s written, it’s typically read in large sequential chunks

• Orbservers can be read or write biased depending on their use

• RRDs really write biased, horribly inefficient

Page 13: ANF SYSTEMS ARCHITECTURE

WHY COMPELLENT?

• Existing storage couldn’t keep up with write biased workload.

• We bought faster disk but it couldn’t fit all of our data there

• Copying data from one volume to another is slow, adds even more workload to overtaxed storage

• We have core systems requirements, but we’re always conducting experiments with data. Hard to predict what will be the most popular data for the day.

Page 14: ANF SYSTEMS ARCHITECTURE

COMPELLENT MANAGEMENT

Writes go into top tier fast disk, trickle down to slower storage automaticallyLeast used data trickles to lowest tier of cheap slow disk.If something suddenly becomes “hot” it will move up to higher tiers

Page 15: ANF SYSTEMS ARCHITECTURE

VMWARE AND LINUX MIGRATION• VMware cluster• 3 servers running at 2/3rds capacity, can survive

outage of one server with no degradation of performance

• Core TA acquisition, processing, distribution migrated to Linux VMs(CentOS 6.2)

• 110 TB of Compellent storage• Most Legacy Solaris Systems are being phased

out• TA acquisition, import/export and analyst

processing all on Linux• Web processing partially migrated

Page 16: ANF SYSTEMS ARCHITECTURE

MIGRATION ADVANTAGES AND BENEFITS

• Commodity Intel hardware price/performance significantly better than Oracle (formerly Sun) SPARC, especially post Oracle

• Ongoing software support looks better long term on Linux• especially with regards to Antelope• Other open source components used in our "stack" have been harder to keep up

to date on Solaris

• CPU speeds significantly faster• No need to differentiate workloads between floating point and integer• Catching up from a maintenance outage is significantly faster• Real-time database processing by analysts is significantly faster• startup time for dbloc2 went from 10 minutes down to 30 seconds

Page 17: ANF SYSTEMS ARCHITECTURE

FURTHER MIGRATION ADVANTAGES AND BENEFITS

• Easier to spin up new virtual machines

• duplicate most of environment for testing

• scaling - add new processing nodes as needed, configured like existing ones

• Made easier by our use of Puppet

• clone a fresh base system, assign role, puppet does the rest. Installs packages, starts/stops services, etc

• Infrastructure represented in code. Software, NFS permissions, allowed users, switch configurations.

Page 18: ANF SYSTEMS ARCHITECTURE

VMWARE

Page 19: ANF SYSTEMS ARCHITECTURE

VMWARE

• Three physical systems running ESXi 5.1 on bare metal

• Virtual machine that works as a supervisor, command dispatcher

• Automatically monitors load, migrates VMs to less loaded physical nodes as needed, transparent to users.

• Running at 2/3rds capacity so one server can be down for maintenance

Page 20: ANF SYSTEMS ARCHITECTURE

SYSTEMS PROVISIONING

• The cool kids call it “DevOps”

• Something similar in use almost since the beginning of the project (2005), prior to that buzzword being coined

• We aren’t using ControlTier but something at that tier will replace my few remaining SSH loops

Page 21: ANF SYSTEMS ARCHITECTURE

CONFIGURATION MANAGEMENT WITH PUPPET

• http://puppetlabs.com

• Puppet is a configuration management tool with a declarative syntax. Similar in concept to Makefiles.

• You describe what something should look like, and the interpreter figures out the steps from current configuration to desired configuration.

Page 22: ANF SYSTEMS ARCHITECTURE

INFRASTRUCTURE AS CODE. LITERALLY.

Page 23: ANF SYSTEMS ARCHITECTURE

PUPPET CODE EXAMPLEpackage { 'openssh-server':

ensure => installed,}

file { '/etc/ssh/sshd_config':source => 'puppet:///modules/sshd/sshd_config',owner => 'root',group => 'root',mode => '640',notify => Service['sshd'], # sshd will restart whenever you edit this file.require => Package['openssh-server'],

}

service { 'sshd':ensure => running,enable => true,hasstatus => true,hasrestart => true,

}

Page 24: ANF SYSTEMS ARCHITECTURE

PUPPET DASHBOARD

• Dashboard gives me a quick overview of changes to code.

• Can also monitor other files/services not controlled by Puppet to alert to changes.

Page 25: ANF SYSTEMS ARCHITECTURE

PUPPET-ANTELOPE• This wouldn’t belong at AUG without tying it into Antelope

somehow.

• https://github.com/UCSD-ANF/puppet-antelope

antelope::instance { 'antelope' :

user => 'rt',

dirs => ['/rtsystems/foo', '/rtsystems/bar'],

}

antelope::instance { 'antelope-baz' :

user => 'basil',

dirs => '/rtsystems/baz',

manage_fact => false, # don't create an entry in the antelope_services fact for

this instance.

}

Page 26: ANF SYSTEMS ARCHITECTURE

ANTELOPE MANAGEMENT

• We use a lot of extensions to Antelope

• PHP, Perl bindings, Image Magick

• Customized parameter files for ANF

• Instrument responses

• Typically running two or three releases of Antelope

• Want consistent extensions available on all of them

Page 27: ANF SYSTEMS ARCHITECTURE

ANTELOPE BUILD PROCESS

• All managed by a series of Makefiles and helper scripts

• Install Antelope

• Copy in initial license file for the build host

• Install $ANTELOPE overlays (localmake_config, some instrument responses)

• Build $ANF site local tree

• Build contrib

Page 28: ANF SYSTEMS ARCHITECTURE

ANTELOPE DISTRIBUTION• Rsync based

• We tried packages, but too much flux of contrib code, don’t want to redistribute 600+ MB package for 2 MB change during incremental builds

• Deploy to development hosts first

• Stop acquisition, kick analysts off

• Deploy to production

• Deploy to backup sites

Page 29: ANF SYSTEMS ARCHITECTURE

POST DISTRIBUTE TASKS

• All managed with Puppet

• System role specific parameter files (Transportable Array versus ANZA)

• Site-specific (and version specific license files)

• Install /etc/init.d/antelope with customized list of rtexec instances