25
VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

Embed Size (px)

Citation preview

Page 1: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

VO Query Language

GSFC XML Group

Ed Shaya

Brian Thomas

Kirk Borne

Page 2: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

VOQL Requirements

Provide a means for users to submit general requests for astronomical information from a distributed set of repositories.

Allow for the science use cases. Easy to learn and use:

– Hide from the user obvious but tedious steps– May require several levels o f language with only the top level being

easy. Allow for web form entry. Independent of internal arrangement of data at repositories. Plug-n-play metadata and ontology. Span a distributed set of heterogeneous services.

– Each VO query can transform to multiple queries in local dialects.– Workflow of interactions between registries, services, and user.– Integration of multiple responses

Page 3: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

More VOQL Requirements

Easy to parse and transform into other forms Extensible

– Sites can extend query language through local namespaces– VO namespace can add language elements into the future.

Page 4: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

XML Query Language

Compatible XML and Human-Readable versions Xquery is a superset of Xpath Based on Quilt, XQL, and XML-QL

– Quilt is based on Object Query Langauge (OQL)– OQL is based on Structured Query Language (SQL)

If,then,else: case switch: basic functions: define new functions FLWR (for, let, where, return)

for $i in (1 to 3)

let $j := (1 to $i)

Results in:

$i = 1, $j = 1

$I = 2, $j = (1,2)

$I =3, $j = (1,2,3)

Page 5: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

XQuery Continued

for $s in document('‘bright_stars.xml'')/*/id_mainlet $b := document('‘photometry.xml'')/*/star[name =

$s]/band where count ($b) > 1

return<colors>

<starName>$i</starname>for $j in (2 to count($b))<color name=“$b[$j]@name - $b[$j-

1]@name”>$b[$j]/value - $b[$j-1]/value

</color></colors>

Page 6: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Page 7: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

OLAP/XMLA

On-line Analytical Processes Reduces bandwidth/time of data out Statistical Package add on to Databases Analysis of DataCubes

– Hierarchy of Axis Values Years, Months, Days, Hours, minutes Degrees, minutes, seconds Interior, core, mantle, atmosphere, mesosphere, exosphere

Page 8: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

JVO Query Language – Naoki Yasuda

Retrieves catalog data and images from multiple data servers via a single user interface

Extension of SQL– Catalog.UCD– Box(Point(c1.ra,c1.dec), width1,height1)– XMATCH(c1,c2,!c3,…)< 3 arcsec– Select Catalog: Keyword1 & Keyword2

Select by [[MAX|MIN](PROPERTY) | ALL] [NAME]

– Area : [inside|outside] area0

Area1 [overlap|union] area2 | shape

SHAPE: box, circle, oval, triangle,point

DIFF(x.obs_date, y.obs_date) > 30 days

Page 9: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Data mining

Beyond finding data; intense data filtering, conditioning, knowledge synthesis.

Grid Services?– Principal Component Analysis– Iterative solutions– Genetic algorithms– Maximum-likelihood functions– Neural nets– Decision trees – Cluster analysis– Regression analysis

Page 10: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Data Objects

Dataset– Tables

Fields– Units– Class (UCD)– Range– Values

– Images Axes Coordinate Maps Data Values

– Spectra Wavelength Intensity

Page 11: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

ADQL

Obtain Data Sets– By bibliographic query

Author, date published, title, journal, volume– By description

Keywords, abstract, mission name Obtain tables

– By title, table #, field names – By Xpath

/LocalGroup/[galaxy=“M31”]/region7/v-band– Obtain table data by UCDs or field names– Min/max of range, regular expression

Obtain N-cube data– Subset by axis values, – subset by ra,dec, radius or more generally Func(axes1..)

Page 12: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Astronomy Data Query Language (ADQL)

Page 13: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

ADQL/Query Schema

Page 14: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Knowledge Based Query

Class Instance Objects Property (V-band) Instance value (-1.4)

– Measurement property values are Data – Modifier (aperture) Instance value (3 arcsec)

Modifier (inequality) Instance value (before, not)

– Aggregate property – member, region, component Values are bags of objects

– SubclassOf property – subclass has restricted property value range or restricted list of properties.

Property Space – N-properties form a space. A bit of math is needed to relate values.

Page 15: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Problem Statement Language: Root

Page 16: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

PSL Constraint

Page 17: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

PSL AstroObject

Page 18: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

<dataset subject="astronomy">

<title>AC 2000.2: The Astrographic Catalogue on the Hipparcos System</title>

<altname type="ADC">1275</altname>

<altname type="CDS">I/275</altname>

<altname type="brief">The AC 2000.2 Catalogue</altname>

<references type="source">

<reference>

<title>AC 2000.2: The Astrographic Catalogue on the Hipparcos System</title>

<author><initial>S</initial><initial>E</initial><lastName>Urban</lastName></author>

<author><initial>T</initial><initial>E</initial><lastName>Corbin</lastName></author>

<author><initial>G</initial><initial>L</initial><lastName>Wycoff</lastName></author>

<author><initial>E</initial><lastName>Hoeg</lastName></author>

<author><initial>C</initial><lastName>Fabricius</lastName></author>

<author><initial>V</initial><initial>V</initial><lastName>Makarov</lastName></author>

<journal><name>Astron. J.</name><volume>115</volume><pageno>1212</pageno>

<date><year>1998</year></date><bibcode>1998AJ....115.1212U</bibcode>

</journal>

</reference>

</references>

Dataset Schema

Page 19: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

<keywords

xml:base=http://adc.gsfc.nasa.gov/keywordLists/adc/ parentListURL="adc_keywordList.html">

<keyword xlink:href="kw_p.html#Positional_data">Positional data</keyword>

<keyword xlink:href="kw_a.html#Astrographic_zones">Astrographic zones</keyword>

<keyword xlink:href="kw_s.html#Surveys">Surveys</keyword>

</keywords>

<descriptions>

<description>

<para>

The AC 2000.2 is a revised version of the 1997 release of the AC 2000 (Cat. <I/247>). It was decided that the availability of an improved reference catalogue and the inclusion of photometry from the Tycho-2 catalogue would be sufficient to warrant a complete re-reduction of the data and a new distribution of the catalogue. The AC 2000.2 catalog contains positions of 4,621,751 stars at the average epoch of plate exposures for each star (average 1907).

</para>

</description>

Dataset Continued

Page 20: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Case Study 0: Setting up the Query

Return RA, Dec, Vmag for stars with 13<Vmag<15 and 10:12:53.5<RA<13:13:43 and 18:38:00<DE< 18:40:00.

PSL: <object class=“star”>

<property name=“Vmag”><range min=“13” max=“15”/><value>?vmag</value>

</property><property name=“RA”>

<range min=“10:12:53.5” max=“13:13:43”/>\<value>?ra</value>

</property><property name=“DE”>

<range min=“18:38:00” max=“18:40:00”/><value>?de</value>

</property></object>

Page 21: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Case Study 0: Mapping Query to Metadata

Search for tables with metadata that satisfy: – Object/[class=“star”] –search-> keyword, description– Property[@name=“Vmag”] –search-> field/UCD, name– Property[@name=“RA”] –search-> field/UCD, name– Property[@name=“DE”] –search-> field/UCD, name– Property/range –search-> field/min and field/max or coverage

attributes

For all such tables, return: ?vmag, ?ra, ?de Also, return group/field[@name=“error”] for group with

Vmag info.

Page 22: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Problem Statement Language (PSL)

Begin RequestConstraint

        Find astronomical objects with the following properties:            AND these properties                 1. Name: assign to var1                 2. Class is "cluster of galaxies | galaxy cluster"                 3. Measurement quantities satisfy:                      a. X-ray brightness > 3.3E7Jy :  assign to var2                        1. Time interval of measurement: 1998Y-1999Y     

         Using the above variables satisfy, the math formulae:

                1. (var2 + var3) < (var1 – log[var4]) OR these constraints             [several constraints for which one must be true etc ]Return a table with the following sequence of fields:   var1    var2  

End Request

PSL Pull down

AndConstrainties, Andproperties

PSL Pull down

AndConstrainties, Andproperties

Property Name Pull Down

Name, Class, etc.

Property Name Pull Down

Name, Class, etc.

MathML Pull down

*,-,/,+,sum,avg,<,>, etc

MathML Pull down

*,-,/,+,sum,avg,<,>, etc

Page 23: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Brian Thomas’ Infrastructure

Page 24: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Tony Linde’s Infrastructure

VO activity – User – Problem Assistant – service to help user state the problem– Ontology – terms and relationships derived from existing data

Workflow – to retrieve data, merge it, analyze it, reduce it Registry – lists all services and their high level metadata Job Control – decides which jobs and when Data Centre – receiver of query for all internal data sources Data Source Service – uses translator to restate query Translator – from data query language to implemented service Languages

– Problem Statement Language (PSL) – Workflow Language (WFL) – Astronomical dataset Query Language (ADQL) – Ontology Query Language (OQL) – Registry Query Language (RQL)

Page 25: VO Query Language GSFC XML Group Ed Shaya Brian Thomas Kirk Borne

May 12-16 2003 IVO Meeting @ Cambridge

Conclusion

Metadata should clearly distinguish between values that are property values and those that are modifiers of properties.

Then, a mapping from a natural(ish) scientific knowledge based language (PSL) to a request language for data-center common items (ADQL) is possible.

A federated system with a VO-wide vocabulary plus specialized (local) namespaces is best for getting started right away and permitting for evolution.