26
Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Embed Size (px)

Citation preview

Page 1: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division1

Experience With NASA’s Grid Miner

Thomas H. Hinke

NASA Ames Research Center

Moffett Field, California, USA

Page 2: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division2

Outline

• Why use the grid for data mining?

• Overview of Grid Miner

• Experience adapting existing stand-along miner to grid

• A recent application of the Grid Miner

Page 3: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division3

Grid Provides Computational Power

• Grid couples needed computational power to data– NASA has a large volume of data stored in its

distributed archives• E.g., In the Earth Science area, the Earth Observing System

Data and Information System (EOSDIS) holds large volume of data at multiple archives

– Data archives are not designed to support user processing

– Grids, coupled to archives, could provide such a computational capability for users

Page 4: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division4

Grid Provides Re-Usable Functions• Grid-provided functions do not have to be re-implemented

for each new mining system– Single sign-on security – Ability to execute jobs at multiple remote sites– Ability to securely move data between sites– Broker to determine best place to execute mining job– Job manager to control mining jobs

• Mining system developers do not have to re-implement common grid services

• Mining system developers can focus on the mining applications and not the issues associated with distributed processing

Page 5: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division5

Grid Will Provide Re-usable Services

• In the future, Grid/Web services will provide the ability to create reusable services that can facilitate the development of data mining systems– Builds on the web services work from the e-

commerce area• Service interface is defined through WSDL (Web

Services Description Language)• Standard access protocol is SOAP (Simple Object

Access Protocol)

Page 6: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division6

Grid Services: A Foundation for Grid Mining

• Global Grid Forum working groups on – Open Grid Services Architecture (OGSA) standard

under development to specify a grid-enabled web services architecture. See “Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration”

– Open Grid Services Infrastructure (OGSI) standard has been released. Specifies common interfaces that all grid services should support.

Page 7: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division7

Grid Mining and OGSA/OGSI

• An OGSA/OGSI compliant mining service could be build

• Mining applications could be built by re-using capabilities provided by existing grid services.

Page 8: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division8

Outline

• Why use the grid for data mining?

• Overview of Grid Miner

• Experience adapting existing stand-along miner to grid

• A recent application of the Grid Miner

Page 9: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division9

Grid Miner

• Developed as one of the early applications on the IPG– Helped debug the IPG

– Provided basis for satisfying a major IPG milestones

• IPG is NASA implementation of Globus-based Grid

• Provides basis for what could be an on-going Grid Mining Service

Page 10: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Grid Miner OperationsGrid Miner Operations Preprocessed

DataPreprocessed

DataDataData

TranslatedData

Patterns/ModelsPatterns/Models

ResultsResults

OutputGIF ImagesHDF-EOSHDF Raster ImagesHDF SDSPolygons (ASCII, DXF)SSM/I MSFC

Brightness TempTIFF ImagesOthers...

Preprocessing AnalysisClustering K Means Isodata MaximumPattern Recognition Bayes Classifier Min. Dist. ClassifierImage Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture OperationsGenetic AlgorithmsNeural NetworksOthers...

Selection and Sampling Subsetting Subsampling Select by Value Coincidence SearchGrid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find HolesImage Processing Cropping Inversion ThresholdingOthers...

InputHDFHDF-EOSGIF PIP-2SSM/I PathfinderSSM/I TDRSSM/I NESDIS Lvl 1BSSM/I MSFC

Brightness TempUS RainLandsatASCII GrassVectors (ASCII Text)

Intergraph RasterOthers...

Figure thanks to Information and Technology Laboratory at the University of Alabama in Huntsville

Page 11: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division11

Mining on the Grid

Grid Mining Agent

IPG Processor

Satellite Data

Archive X

Satellite Data

Archive Y

Grid Mining Agent

IPG Processor

Grid Mining Agent

IPG Processor

Page 12: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division12

Grid Miner Architecture

Grid Mining Agent

IPG Processor

MiningDatabaseDaemon

ControlDatabase

IPG Processor

Grid Mining Agent

IPG Processor

Mining OperationsRepository

IPG Processor

Data

Archive X

Satellite Data

Archive Y

MinerConfiigServer

IPG Processor

Page 13: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division13

Example: Mining for Mesoscale Convective Systems

Image shows results from mining SSM/I data

Page 14: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division14

Outline

• Why use the grid for data mining?

• Overview of Grid Miner

• Experience adapting existing stand-along miner to grid

• A recent application of the Grid Miner

Page 15: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division15

Starting Point for Grid Miner• Grid Miner reused code from object-oriented ADaM data

mining system – Developed under NASA grant at the University of Alabama in

Huntsville, USA– Implemented in C++ as stand-alone, objected-oriented mining

system• Runs on NT, IRIX, Linux

– Has been used to support research personnel at the Global Hydrology and Climate Center and a few other sites.

• Object-oriented nature of ADaM provided excellent base for enhancements to transform ADaM into Grid Miner

Page 16: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division16

Transforming Stand-Alone Data Miner into Grid Miner

• Original stand-alone miner had 459 C++ classes.

• Had to make small modifications to ADaM– Modified 5 existing classes – Added 3 new classes

• Grid commands added for– Staging miner agent to remote sites– Moving data to mining processor

Page 17: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division17

Staging Data Mining Agent to Remote Processor

globusrun -w -r target_processor '&(executable=$(GLOBUSRUN_GASS_URL)# path_to_agent)(arguments=arg1 arg2 … argN)(minMemory=500)'

Page 18: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division18

Moving Data to be Mined

gsincftpget remote_processor local_directory remote_file

Page 19: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division19

Outline

• Why use the grid for data mining?

• Overview of Grid Miner

• Experience adapting existing stand-along miner to grid

• A recent application of the Grid Miner

Page 20: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division20

Demonstrate Grid Support for Interdisciplinary Earth Science Research

• Goal: Combine data from two distinctly different instruments (stored on two different grid-connected mass storage systems) to produce new insights by looking at data covering the same time and place across data from the two different instruments.

• Approach: – Use Grid Miner to mine TMI data for mesoscale convective

systems.– Generate feature index (convex hull polygon) for all

mesoscale convective systems found.– Transmit polygons in form of XML document to subsetter. – Subset CERES SSF data that corresponds to mesoscale

convective systems discovered by Grid Miner

Page 21: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division21

Desired Processing Pattern

Data Archive

Data Archive Network Subsetting

Data Mining

NASA Atmospheric Sciences Data Center

Data Archived at NASA Ames

Grid Processing

To User

Ideally Grid Miner would use grid resources co-located with the data, but if not, could use

available remote grid resources

Page 22: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

The Details

Data Cache

Grid Processor

CERESData

Archive

TMI Data on Mass Store

Storage Resource Broker

Grid Mining Agent

IPG Processor

Subsetter

Grid Processor

LaRC Atmospheric Sciences Data Center

IPG

MSCP (2) sends mining plan and (5) retrievesFeature Index

(MSCP) Miner-SubsetterControl Program

Grid Processor

Miner Executables

IPG Processor

IPG Broker

IPG Processor

(1) Broker Selects IPG Resource for Mining

(3) MSCP transfers GridAgent to Mining Site using Job Manager (not shown)

(4) GridAgent Transfers Mining Ops

(6) MSCP Starts Subsetter on Feature Index

(5)

Page 23: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division23

Example of Data Being Mined

• 230 MB contained in 15 orbit files for one day of TMI (TRMM [Tropical Rainfall Measuring Mission] Microwave Imager) data

• Much higher resolution data exists with significantly higher volume.

Page 24: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Mining and Subsetting Results

QuickTime™ and aTIFF (Uncompressed) decompressorare needed to see this picture.

Grid Miner produced XML (Extensible Markup Language) document of polygons that circumscribe mesoscale convective systems. (MCSs) The following shows a portion of the XML description for two of the 64 vertices that comprise the convex hull polygon produced for the third MCS found by the miner in TMI data for April 1, 1998:

<polygon><julian_date_time> 2450904.754815 </julian_date_time><human_date_time> 1998-04-01 GMT 06:06:56 </human_date_time><size_in_square_km> 2083.126221 </size_in_square_km><region_type> 2 </region_type><vertices><number_of_vertices> 64 </number_of_vertices><vertex><latitude> -2.26 </latitude><longitude> -178.28 </longitude></vertex><vertex><latitude> -2.08 </latitude><longitude> -178.38 </longitude></vertex>...</polygon></polygon_list>

Grid Miner produced view of area mined using TMI data.

CERES SSF footprints for April 1, 1998 hour 6 corresponding to the third MCS found by Grid MinerConvex Hull: 3 with 9

Footprint: 1 15804Footprint: 2 15805Footprint: 3 16090Footprint: 4 16091Footprint: 5 16094Footprint: 6 16376Footprint: 7 16377Footprint: 8 16381Footprint: 9 16382

Page 25: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA

Ames Research Center NAS Division25

Current Status• Currently works on the IPG as a prototype system• User documentation underway• Data archives need to be grid-enabled

– Connected to the grid– Provide controlled access to data on tertiary storage

• E.g., by using a system such as the Storage Resource Broker that was developed at the San Diego Super Computer Center

• Some earlier-adopter users need to be found to begin using the Grid Miner– Willing to code any new operations needed for their

applications– Willing to work with system with prototype-level

documentation

Page 26: Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA