29
Jan 12, 2004 ABC-PROJECT 1 ABC Data Management Erwin R. Boer (Cal-IT2, UIOWA, UMN, ERB-Consulting) Knowledge Extraction at our Fingertips: Hypothesis testing at any Scale Joining Forces to Build an ABC Data Center in La Jolla

ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 1

ABC Data ManagementErwin R. Boer (Cal-IT2, UIOWA, UMN, ERB-Consulting)

Knowledge Extraction at our Fingertips: Hypothesis testing at any Scale

Joining Forces to Build an ABC Data Center in La Jolla

Page 2: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 2

ABC Observation StationsAUAV Mission Planning and Control Requires Real Time Data

Integration

Page 3: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 3

Central Equatorial Pacific Experiment (CEPEX)Indian Ocean Experiment (INDOEX)

Page 4: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 4

ABC Goes beyond Physical Sciences

Page 5: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 5

Serving the Scientists through Speed and Flexibility Knowing where Data Preparation

Ends and Science StartsProviding accessibility plus controllability

• Speed - Reducing data preparation time increases chance research gets conducted

– Establishing data sets for further analysis (e.g. extraction and collocation)– Structuring the data for further analysis (e.g. registering for data assimilation for

climate models)• Flexibility - Enhancing the data manipulation environment increases

scientific production– Manipulating data for analysis (e.g. ingest into existing tools)– Finding structure in the data (e.g. multidimensional visualizations)– Outlier identification (e.g. identify samples across views)– Integrated visualization of data and analysis results (e.g. integrate results back

into dataset)– Finding similar cases in all data (e.g. example-based data mining)

• Provide immediate service and build augmented tools to continually improve old and introduce new services & tools

Page 6: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 6

Goal pertaining to ABC• Begin process of developing the ABC data center in La

Jolla• Adopt philosophy used in data integration for CEPEX

and to smaller degree for INDOEX• Use new technologies to implement these scientist

centered design principles• Write proposal over next 8 weeks• Identify key people in relevant areas of expertise and

approach them for collaboration• Solicit ideas and ideals

Page 7: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 7

Data Centers, Migration, and Augmentation of DataArchival, Mission Planning, Hypothesis Testing, and Data Mining

?

Page 8: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 8

The Twelve Curses of Data Manipulation

Not all Scientists are Programmers but Support for many Programming Environments is Necessary

1. Collocated data availability2. Data volume3. Data migration4. Plurality of data formats5. Enforced formats CIDS6. Multiplicity of manipulation

languages

7. Enforced tools8. Data heterogeneity9. Inflexible data mapping tools

Collocation10.Case uniqueness for similarity mining11.Visualization as a means versus an end

3D, Sctr12.Human patience

NextManipulation is as if not more important than visualization partly due to precedence

Page 9: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 9

http://www-c4.ucsd.edu/~cids/

Goal 1:Data Conformity

in format, coordinates, and meta-data.

Goal 2:Separation of

spatiotemporal coordinates and data

Page 10: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 10

Goal 3:Subset-

Extraction of desired spatiotemporal bounding box.

Goal 4:Fast and

informative collocation between any two (extracted) data sets.

12 Curseshttp://www-c4.ucsd.edu/~cids/

Page 11: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 11

Geostationary Satellite Image GOES-7

3D Cloud Reconstruction Under Strong Spectral Constraints

Page 12: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 12

Interactive Visualization as Means to Effectuate Hypothesis Verification and

Falsification

Page 13: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 13

Page 14: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 14

GMS JDAY 93 - 22:32 GMT

Page 15: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 15

Page 16: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 16

Page 17: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 17

Page 18: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 18

Page 19: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 19

Page 20: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 20

Page 21: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 2112 Curses

Page 22: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 22

Situated Assessment of Data Constraints:Screening for Multi-Layer versus Single Layer Cloud

Systems

12 Curses

Page 23: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 23

Collocation between Multiple Aircraft

Trivial but Horrendously Time Consuming if Implemented Trivially

Coordinated Flight of AUAVs

Page 24: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 24

Automatic Collocation

between Aircraft and

Geostationary satellite

Page 25: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 25

Assessing Validity of Collocation Parameters

•Spatio-temporal tolerances•Footprint matching

Page 26: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 26

Identifying Curious Outliers

Page 27: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 27

Scientists Struggled in their Quest of Resolve

Uncertainty:

Computer Scientists Gave them

Probing Tools

12 Curses

Page 28: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 28

Atmospheric Brown Cloud - ProjectData Archival and Manipulation Goals

Establish Data Center in La Jolla for archival, dissemination, and augmentation of all ABC data

• DBMS – ORACLE?– Consistent with other sites

(including mirror sites)– Different views consistent with

multitude of project goals– Estimated at 100s of TBs

• Operational - Meta data augmentation and extraction

– Textual– Coverage (specialized data

structures)– Statistics– Snapshots– Visualizations

• Operational - Creation of integrated data sets

– Initialization of and comparison with models

– Hypothesis testing and data mining

• Interactive – Browsing for data suitability– Assess what data available when and

from where (requires good visualizations)

– Establish insight into data constraints• Interactive – Data extraction and

collocation– Extraction of specified subsets (e.g.in

the field)– Collocation of heterogeneous data

sets– Dispersement of ingest, registering,

and visualization tools• Interactive – Data Mining

– Characterizing a particular event – Automatic search for specific

conditions– Guided exploration of outliers

• Workshop– Scientists work with programmers to

accommodate specialized needs

Page 29: ABC Data Manipulation · The Twelve Curses of Data Manipulation Not all Scientists are Programmers but Support for many Programming Environments is Necessary 1. Collocated data availability

Jan 12, 2004 ABC-PROJECT 29

Research IssuesIdentifying Key Players for Participation

• Database & Data Format– Efficient separation of spatio-temporal coordinates and sensor measurements

or model predictions– Flexible data format for efficient extractions

• Extraction & Collocation– Efficient representation & visualization of spatio-temporal data coordinates

(including spatio-temporal foot print)– Fast collocation between any number of heterogeneous spatio-temporal data

structures• Registering & Representation

– Efficient representation of co-registered data sets – Flexible and expandable (e.g. include models) tools to map heterogeneous

spatio-temporal data sets onto a common coordinate frame.• Case Representation & Mining

– Efficient representation of a particular case (i.e. augmented data state)– Fast mining of similar case

• Manipulation & Visualization– Efficient means to explore data from any abstract and model augmented

perspective– Fast integrated visualization of heterogeneous data sets