Data Centre Infrastructure

  • View
    218

  • Download
    0

Embed Size (px)

Text of Data Centre Infrastructure

  • 8/6/2019 Data Centre Infrastructure

    1/26

    WorkshopGarching, June 27 July 1 2005

    Data Centre InfrastructureData Centre Infrastructure

    Keith NoddleKeith NoddleAstroGridAstroGrid

  • 8/6/2019 Data Centre Infrastructure

    2/26

    Keith NoddleKeith Noddle

    IntroductionIntroduction

    What is the VirtualWhat is the VirtualObservatory (VO)?Observatory (VO)?

    The VO vision can be summed

    up as the desire to make allarchives speak the same

    language all searchable and

    analysable by the same tools,

    all data sources accessible

    through a common interface, alldata held in distributed

    databases that appear as one.Andy Lawrence, 09/2003

  • 8/6/2019 Data Centre Infrastructure

    3/26

    Keith NoddleKeith Noddle

    IntroductionIntroduction

    Key to this, then must beKey to this, then must bethe mechanisms we usethe mechanisms we useto access those data.to access those data.

    Enter the Data CentreEnter the Data CentreInfrastructure as envisagedInfrastructure as envisaged

    and implemented by theand implemented by theIVOA partners!IVOA partners!

  • 8/6/2019 Data Centre Infrastructure

    4/26

    Keith NoddleKeith Noddle

    Data Centre InfrastructureData Centre Infrastructure

    Data Centres containData Centres containcatalogues, images, spectra,catalogues, images, spectra,etc. all lovingly stored inetc. all lovingly stored in

    many and various waysmany and various ways

    The trouble is, theres noThe trouble is, theres noone way to get at theseone way to get at thesedata, each archive seems todata, each archive seems to

    have its own bespokehave its own bespokeaccess layeraccess layer often for veryoften for verygood reasonsgood reasons

  • 8/6/2019 Data Centre Infrastructure

    5/26

    Keith NoddleKeith Noddle

    Data Centre InfrastructureData Centre Infrastructure

    And of course, theres noAnd of course, theres nopoint being able to accesspoint being able to accessthese data if they cant thenthese data if they cant then

    be further processed.be further processed.

    Increasingly, suchIncreasingly, suchprocessing is surpassingprocessing is surpassingthe capacity of desktopthe capacity of desktop

    machines to perform.machines to perform.

  • 8/6/2019 Data Centre Infrastructure

    6/26

    Keith NoddleKeith Noddle

    Data Centre InfrastructureData Centre Infrastructure

    This is where the VO comes in.This is where the VO comes in.

    This talk concentrates on how data are accessed butThis talk concentrates on how data are accessed butdoes so within the wider contextdoes so within the wider context

    of the VO as a whole.of the VO as a whole.

  • 8/6/2019 Data Centre Infrastructure

    7/26

    Keith NoddleKeith Noddle

    Accessing The Data: NowAccessing The Data: Now

    Access methods are mostlyAccess methods are mostlyheterogeneous, that is,heterogeneous, that is,theres not muchtheres not much

    commonality between themcommonality between them

    Methods include:Methods include:

    CGICGI

    BespokeBespokeWeb ServicesWeb Services

  • 8/6/2019 Data Centre Infrastructure

    8/26

    Keith NoddleKeith Noddle

    Accessing The Data: VisionAccessing The Data: Vision

    The VO vision is to makeThe VO vision is to makeaccess common acrossaccess common acrosscompliant Data Centrescompliant Data Centres

    This being achieved throughThis being achieved throughthe development of a seriesthe development of a series

    of standards, overseen byof standards, overseen bythe International Virtualthe International VirtualObservatory Alliance (IVOA)Observatory Alliance (IVOA)

  • 8/6/2019 Data Centre Infrastructure

    9/26

    Keith NoddleKeith Noddle

    Accessing The Data: StandardsAccessing The Data: Standards

    A standard query language (ADQL / VOQL)A standard query language (ADQL / VOQL) SkyNodeSkyNode SIAPSIAP SSAPSSAP Cone SearchCone Search CEA.CEA.

    The IVOA has agreed (or isThe IVOA has agreed (or isagreeing) several standards foragreeing) several standards fordata access, dependent upon thedata access, dependent upon thedata themselves. These include:data themselves. These include:

  • 8/6/2019 Data Centre Infrastructure

    10/26

    Keith NoddleKeith Noddle

    ADQLADQL

    ADQL is an XML language for constructing queries

    ADQL has two forms: ADQL/x : An XML document conforming to a published XSD

    ADQL/s : A String form based on SQL92 conforming to the ADQLgrammar. Some non standard extensions to SQL92 are added tosupport distributed astronomical queries.

    ADQL/x and ADQL/s are translatable between each otherwithout loss of information.

    Extensions to SQL92AliasesAliases

    Regions (CIRCLE)Regions (CIRCLE)

    Mathematical FunctionsMathematical Functions

  • 8/6/2019 Data Centre Infrastructure

    11/26

    Keith NoddleKeith Noddle

    ADQL ExampleADQL ExampleADQL/s:

    Select a.* from Tab a where Region('Circle Cartesian 1.2 2.4 3.6 0.2')

    Becomes ADQSL/x:

  • 8/6/2019 Data Centre Infrastructure

    12/26

    Keith NoddleKeith Noddle

    ADQL ExampleADQL Example (cont)(cont)

    X Y Z

    1.2

    2.43.6

    0.2

  • 8/6/2019 Data Centre Infrastructure

    13/26

    Keith NoddleKeith Noddle

    SkyNodeSkyNode

  • 8/6/2019 Data Centre Infrastructure

    14/26

    Keith NoddleKeith Noddle

    SIAP/SSAP/SLAPSIAP/SSAP/SLAP

    Simple Image Access ProtocolSimple Image Access Protocol

    Simple Spectral Access ProtocolSimple Spectral Access Protocol

    Simplistically:Simplistically: Take position and region as inputTake position and region as input

    Return pointers to lists ofReturn pointers to lists ofdownloadable datasetsdownloadable datasets

    Other optional capabilitiesOther optional capabilities

    Pending: Spectral Line AccessPending: Spectral Line AccessProtocolProtocol

  • 8/6/2019 Data Centre Infrastructure

    15/26

    Keith NoddleKeith Noddle

    Common Execution ArchitectureCommon Execution Architecture

    The CEA defines a reasonably smallThe CEA defines a reasonably smallset of interfaces and schema to modelset of interfaces and schema to modelhow a typical Astronomical applicationhow a typical Astronomical applicationcan be executed within the VO.can be executed within the VO.

    In particular it:In particular it: Defines a uniform interface and model for an application and itsDefines a uniform interface and model for an application and its

    parametersparameters Provides a higher level description than WSDL 1.1 can offerProvides a higher level description than WSDL 1.1 can offer Provide extensions with the VO Resource schemaProvide extensions with the VO Resource schema Supports asynchronous operation of an applicationSupports asynchronous operation of an application Allows for the data flow to not necessarily have to follow the call treeAllows for the data flow to not necessarily have to follow the call tree

    See: http://www.ivoa.net/Documents/Notes/CEA/CEADesignIVOANoteSee: http://www.ivoa.net/Documents/Notes/CEA/CEADesignIVOANote--20050513.html20050513.html

  • 8/6/2019 Data Centre Infrastructure

    16/26

    Keith NoddleKeith Noddle

    Accessing The Data: BenefitsAccessing The Data: Benefits

    A standard mechanism for accessing data of aA standard mechanism for accessing data of agiven typegiven type

    Interoperability between datasets, lending itself toInteroperability between datasets, lending itself topractical multipractical multi--wavelength analysiswavelength analysis

    Machine accessibility to these data.Machine accessibility to these data.

    The crucial benefits include:The crucial benefits include:

  • 8/6/2019 Data Centre Infrastructure

    17/26

    Keith NoddleKeith Noddle

    Accessing The Data: ExamplesAccessing The Data: Examples

    The first query can be run against the SDSS usingThe first query can be run against the SDSS usingOpenSkyQueryOpenSkyQuery

    The second can be run against INTThe second can be run against INT--WFS using theWFS using theAstroGrid portalAstroGrid portal

    Looking at the two services below, we canLooking at the two services below, we cansee how the following ADQL query can besee how the following ADQL query can beexecuted on both systems without changeexecuted on both systems without change table names excepted. From that, thetable names excepted. From that, thepossibilities for undertaking practicalpossibilities for undertaking practical

    research across multiple datasets becomesresearch across multiple datasets becomesapparent.apparent.

  • 8/6/2019 Data Centre Infrastructure

    18/26

    Keith NoddleKeith Noddle

    Accessing The Data: ADQLAccessing The Data: ADQL

    The first query is:The first query is:

    SELECT o.g, o.r FROMSELECT o.g, o.r FROM

    sdss:photoprimary osdss:photoprimary o WHEREWHERE

    (o.Err_g > 0) AND(o.Err_g > 0) AND

    (o.Err_g < 0.5) AND(o.Err_g < 0.5) AND

    (o.Err_r > 0) AND(o.Err_r > 0) AND

    (o.Err_r < 0.5) AND(o.Err_r < 0.5) AND

    (o.g(o.g -- o.r > 5)o.r > 5)

    The second is:The second is:

    SELECT o.g, o.r FROMSELECT o.g, o.r FROM

    obj as oobj as o WHEREWHERE

    (o.Err_g > 0) AND(o.Err_g > 0) AND

    (o.Err_g < 0.5) AND(o.Err_g < 0.5) AND

    (o.Err_r > 0) AND(o.Err_r > 0) AND

    (o.Err_r < 0.5) AND(o.Err_r < 0.5) AND

    (o.g(o.g -- o.r > 5)o.r > 5)

  • 8/6/2019 Data Centre Infrastructure

    19/26

    Keith NoddleKeith Noddle

    Specific ImplementationSpecific Implementation

    7.7 Million,7.7 Million,