Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Jan 12, 2004 ABC-PROJECT 1
ABC Data ManagementErwin R. Boer (Cal-IT2, UIOWA, UMN, ERB-Consulting)
Knowledge Extraction at our Fingertips: Hypothesis testing at any Scale
Joining Forces to Build an ABC Data Center in La Jolla
Jan 12, 2004 ABC-PROJECT 2
ABC Observation StationsAUAV Mission Planning and Control Requires Real Time Data
Integration
Jan 12, 2004 ABC-PROJECT 3
Central Equatorial Pacific Experiment (CEPEX)Indian Ocean Experiment (INDOEX)
Jan 12, 2004 ABC-PROJECT 4
ABC Goes beyond Physical Sciences
Jan 12, 2004 ABC-PROJECT 5
Serving the Scientists through Speed and Flexibility Knowing where Data Preparation
Ends and Science StartsProviding accessibility plus controllability
• Speed - Reducing data preparation time increases chance research gets conducted
– Establishing data sets for further analysis (e.g. extraction and collocation)– Structuring the data for further analysis (e.g. registering for data assimilation for
climate models)• Flexibility - Enhancing the data manipulation environment increases
scientific production– Manipulating data for analysis (e.g. ingest into existing tools)– Finding structure in the data (e.g. multidimensional visualizations)– Outlier identification (e.g. identify samples across views)– Integrated visualization of data and analysis results (e.g. integrate results back
into dataset)– Finding similar cases in all data (e.g. example-based data mining)
• Provide immediate service and build augmented tools to continually improve old and introduce new services & tools
Jan 12, 2004 ABC-PROJECT 6
Goal pertaining to ABC• Begin process of developing the ABC data center in La
Jolla• Adopt philosophy used in data integration for CEPEX
and to smaller degree for INDOEX• Use new technologies to implement these scientist
centered design principles• Write proposal over next 8 weeks• Identify key people in relevant areas of expertise and
approach them for collaboration• Solicit ideas and ideals
Jan 12, 2004 ABC-PROJECT 7
Data Centers, Migration, and Augmentation of DataArchival, Mission Planning, Hypothesis Testing, and Data Mining
?
Jan 12, 2004 ABC-PROJECT 8
The Twelve Curses of Data Manipulation
Not all Scientists are Programmers but Support for many Programming Environments is Necessary
1. Collocated data availability2. Data volume3. Data migration4. Plurality of data formats5. Enforced formats CIDS6. Multiplicity of manipulation
languages
7. Enforced tools8. Data heterogeneity9. Inflexible data mapping tools
Collocation10.Case uniqueness for similarity mining11.Visualization as a means versus an end
3D, Sctr12.Human patience
NextManipulation is as if not more important than visualization partly due to precedence
Jan 12, 2004 ABC-PROJECT 9
http://www-c4.ucsd.edu/~cids/
Goal 1:Data Conformity
in format, coordinates, and meta-data.
Goal 2:Separation of
spatiotemporal coordinates and data
Jan 12, 2004 ABC-PROJECT 10
Goal 3:Subset-
Extraction of desired spatiotemporal bounding box.
Goal 4:Fast and
informative collocation between any two (extracted) data sets.
12 Curseshttp://www-c4.ucsd.edu/~cids/
Jan 12, 2004 ABC-PROJECT 11
Geostationary Satellite Image GOES-7
3D Cloud Reconstruction Under Strong Spectral Constraints
Jan 12, 2004 ABC-PROJECT 12
Interactive Visualization as Means to Effectuate Hypothesis Verification and
Falsification
Jan 12, 2004 ABC-PROJECT 13
Jan 12, 2004 ABC-PROJECT 14
GMS JDAY 93 - 22:32 GMT
Jan 12, 2004 ABC-PROJECT 15
Jan 12, 2004 ABC-PROJECT 16
Jan 12, 2004 ABC-PROJECT 17
Jan 12, 2004 ABC-PROJECT 18
Jan 12, 2004 ABC-PROJECT 19
Jan 12, 2004 ABC-PROJECT 20
Jan 12, 2004 ABC-PROJECT 2112 Curses
Jan 12, 2004 ABC-PROJECT 22
Situated Assessment of Data Constraints:Screening for Multi-Layer versus Single Layer Cloud
Systems
12 Curses
Jan 12, 2004 ABC-PROJECT 23
Collocation between Multiple Aircraft
Trivial but Horrendously Time Consuming if Implemented Trivially
Coordinated Flight of AUAVs
Jan 12, 2004 ABC-PROJECT 24
Automatic Collocation
between Aircraft and
Geostationary satellite
Jan 12, 2004 ABC-PROJECT 25
Assessing Validity of Collocation Parameters
•Spatio-temporal tolerances•Footprint matching
Jan 12, 2004 ABC-PROJECT 26
Identifying Curious Outliers
Jan 12, 2004 ABC-PROJECT 27
Scientists Struggled in their Quest of Resolve
Uncertainty:
Computer Scientists Gave them
Probing Tools
12 Curses
Jan 12, 2004 ABC-PROJECT 28
Atmospheric Brown Cloud - ProjectData Archival and Manipulation Goals
Establish Data Center in La Jolla for archival, dissemination, and augmentation of all ABC data
• DBMS – ORACLE?– Consistent with other sites
(including mirror sites)– Different views consistent with
multitude of project goals– Estimated at 100s of TBs
• Operational - Meta data augmentation and extraction
– Textual– Coverage (specialized data
structures)– Statistics– Snapshots– Visualizations
• Operational - Creation of integrated data sets
– Initialization of and comparison with models
– Hypothesis testing and data mining
• Interactive – Browsing for data suitability– Assess what data available when and
from where (requires good visualizations)
– Establish insight into data constraints• Interactive – Data extraction and
collocation– Extraction of specified subsets (e.g.in
the field)– Collocation of heterogeneous data
sets– Dispersement of ingest, registering,
and visualization tools• Interactive – Data Mining
– Characterizing a particular event – Automatic search for specific
conditions– Guided exploration of outliers
• Workshop– Scientists work with programmers to
accommodate specialized needs
Jan 12, 2004 ABC-PROJECT 29
Research IssuesIdentifying Key Players for Participation
• Database & Data Format– Efficient separation of spatio-temporal coordinates and sensor measurements
or model predictions– Flexible data format for efficient extractions
• Extraction & Collocation– Efficient representation & visualization of spatio-temporal data coordinates
(including spatio-temporal foot print)– Fast collocation between any number of heterogeneous spatio-temporal data
structures• Registering & Representation
– Efficient representation of co-registered data sets – Flexible and expandable (e.g. include models) tools to map heterogeneous
spatio-temporal data sets onto a common coordinate frame.• Case Representation & Mining
– Efficient representation of a particular case (i.e. augmented data state)– Fast mining of similar case
• Manipulation & Visualization– Efficient means to explore data from any abstract and model augmented
perspective– Fast integrated visualization of heterogeneous data sets