25
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia , P. Trunfio DEIS University of Calabria ITALY [email protected] Future Generation Grids, Dagstuhl Seminar, November 2004

Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY [email protected]

Embed Size (px)

Citation preview

Page 1: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

Designing Services for Grid-based Knowledge Discovery

A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio

DEISUniversity of Calabria

[email protected]

Future Generation Grids, Dagstuhl Seminar, November 2004

Page 2: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

2

SUMMARY

The use of computers is changing our way to make discoveries and is improving both speed and quality of the discovery processes.

In this scenario the Grid can provide an effective computational support for distributed knowledge discovery from large and distributed data sets. To this purpose we designed a system called Knowledge Grid.

This talk discusses how to design distributed knowledge discovery services, according to the OGSA model, by using the Knowledge Grid services starting from searching Grid resources, composing software and data elements, and executing the resulting application on a Grid.

Page 3: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

3

OUTLINE

MOTIVATIONS

TOWARDS KNOWLEDGE SERVICES

THE KNOWLEDGE GRID

OGSA SERVICES FOR KNOWLEDGE DISCOVERY

A META-LEARNING EXAMPLE

CONCLUSIONS

Page 4: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

4

MOTIVATIONS

Lots of data collected and warehoused.

Data collected and stored at enormous speeds in local

databases, from remote sources, or from the sky.

Scientific simulations generating terabytes of data.

Huge data sets are hard to understand.

Traditional techniques are infeasible for raw data.

Computational science is evolving toward data-intensive applications that include

• data analysis, • information management, and • knowledge discovery.

Page 5: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

5

MOTIVATIONS

Most data will never be examined by humans; it is analyzed and summarized by computers.

Data analysis is becoming a key element in scientific discovery and in business processes.

Data intensive applications are defined to be those that explore, query, analyze, visualize, and in general, process very large-scale data sets.

Data intensive applications help

• scientists in hypothesis formation

• companies to provide better, customized services and support decision making.

Page 6: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

6

SCIENTIFIC OBJECTIVES

This objective can be achieved through

• development of techniques and tools for supporting data intensive applications and

• integration of Data and Computation Grids with Information and Knowledge Grids.

to support the process of unification of data management and knowledge discovery systems with Grid technologies for providing knowledge-based Grid services.

TOWARDS KNOWLEDGE SERVICES

Grid-aware Knowledge Discovery

Systems

Page 7: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

7

KNOWLEDGE GRID - a distributed knowledge discovery architecture that integrates data mining techniques and computational Grid resources.

In the KNOWLEDGE GRID architecture data mining tools are integrated with lower-level Grid mechanisms and services and exploit Data Grid services.

This approach benefits from "standard" Grid services and offers an open architecture that can be configured on top of generic Grid middleware.

THE KNOWLEDGE GRID PAST

Page 8: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

8

KNOWLEDGE GRID ARCHITECTURE

Generic and Data Grid Services

K N O W L E D G E G R I D

DASData AccessService

TAASTools and Algorithms

Access Service

EPMSExecution Plan

Management Service

RPSResult

Presentation Service

KDSKnowledge Directory

Service

RAEMSResource Alloc.Execution Mng.

KEPRKMR KBR

High level K-Grid layer

Core K-Grid layer

Resource MetadataExecution Plan MetadataModel Metadata

PAST

Page 9: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

9

THE KNOWLEDGE GRID

D3 S1

D1

D2

S3 S2 H1

D2

H2

H1

D2D2

H3

D2

D1

S3

D4H2

D3

S1 D4H3

Component Selection

Application Workflow Composition

Application Execution on the Grid

Service Selection

PASTFUTURE

Page 10: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

10

OGSA KNOWLEDGE GRID SERVICES

The KNOWLEDGE GRID is an abstract service-based Grid architecture that does not limit the user in developing and

using service-based knowledge discovery applications.

We are defining a set of Grid Services that export functionality and operations of the KNOWLEDGE GRID.

Each of the KNOWLEDGE GRID services is exposed as a persistent service, using the OGSA conventions and

mechanisms.

FUTURE

Page 11: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

11

KNOWKEDGE SERVICES: A Meta-Learning Example

A simple example of meta-learning process over the KNOWLEDGE GRID.

To show how the execution of a significant distributed data mining application can benefit from the Knowledge Grid services, provided through the OGSA model.

Meta-learning aims to generate a number of independent classifiers by applying learning programs to a collection of distributed data sets in parallel.

The classifiers computed by learning programs are then collected and combined to obtain a global classifier.

Page 12: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

12

KNOWKEDGE SERVICES: A Meta-Learning Example

LearnerLi

TrainingSet TRi

Nodei

PartitionerP

DataSet DS

NodeA

LearnerL1

TrainingSet TR1

Node1

LearnerLn

TrainingSet TRn

Noden

Step 1

Combiner/Tester CT

ValidationSet VS Testing

Set TS

ClassifierC1

ClassifierCi

ClassifierCn

Global ClassifierGC

NodeZ

Step 2

Step 3

Page 13: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

13

KNOWKEDGE SERVICES: A Meta-Learning Example

A user application interacts with Knowledge Grid nodes to generate a classifier by combining the classifiers built from different subsets of a given data set.

The scenario comprises five nodes:• NU, running the user application that builds the meta-learning

application and visualizes the global classifier;

• NS, which is used for resource discovery and for steering the meta-learning application execution;

• NA, on which the original dataset is located and it provides a data partitioning service;

• NC, providing learning services which are performed in parallel over a homogeneous cluster;

• NZ, providing a combiner/tester service used to compute the global classifier.

Page 14: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

14

The user application invokes the DAS and TAAS services on the node Ns specifying the required resources: two nodes providing services for the metalearning process (a learner and a combiner/tester) and for resource reservation.

RESOURCE DISCOVERY AND EXECUTION PLANNING

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS DAS

Resource Reservation

Factory

R

CombinerFactory

TAAS

RESOURCE DISCOVERY AND EXECUTION PLANNING

The DAS and TAAS services of node Ns invoke the corresponding services on other Knowledge Grid nodes, in order to obtain information about the needed resources. Contacted nodes reply to node Ns sending meta-information.

On node Ns, the meta-information about nodes Nc and Nz is analyzed, and such nodes are identified as candidates for the computation. The DAS and TAAS services on node Ns send this information to the U.A..

The application builds an execution plan for the meta-learning process, specifying strategies for data movement and algorithm execution. The execution plan is submitted to the EPMS of node Ns.

NU NS

NA NC NZ

Page 15: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

15

The EPMS invokes the factories on Na, Nc and Nz requesting the creation of a partitioner service on node Na, and the creation of two reservation services on Nc and Nz. On node Nc,computing cycles are reserved (on each computing element) to execute the learner programs, storage space is reserved to maintain the subsets extracted from DS and the partial classifiers. On node Nz, storage space is reserved to maintain the partial and global classifiers.

SCIENTIFIC OBJECTIVESKDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

NU NS

NA NC NZ

Page 16: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

16

SCIENTIFIC OBJECTIVES

The requests made by the EPMS result in the creation of the requested services.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

PartitionerService Reservation

ServiceReservation

Service

NU NS

NA NC NZ

Page 17: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

17

SCIENTIFIC OBJECTIVES

The partitioner service interacts with the database service on the same node to extract theneeded subsets from DS: n training sets, a testing set and a validation set.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS

PartitionerService Reservation

Service

DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

ReservationService

NU NS

NA NC NZ

Page 18: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

18

SCIENTIFIC OBJECTIVES

The EPMS invokes the DAS service on node Na, requesting to transfer the training sets to node Nc, and the testing and validation sets to node Nz; the learner factory on Nc, requesting the creation of n learner service instances to be run on the same node.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS

PartitionerService Reservation

Service

DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

ReservationService

NU NS

NA NC NZ

Page 19: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

19

SCIENTIFIC OBJECTIVES

On node Nc, n learner service instances are created. On each computing element of node Nc, the learner service instances generate the partial classifiers. As soon as each partial classifier is obtained, a notification message is sent to the EPMS.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS

PartitionerService Reservation

Service

Learner Serv.Learner Serv.Learner Serv.

DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

ReservationService

NU NS

NA NC NZ

Page 20: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

20

SCIENTIFIC OBJECTIVES

The EPMS invokes (i) the DAS service on node Nc, requesting to transfer the generated classifiers to node Nz; the combiner/tester factory on Nz, requesting the creation of a combiner/tester service to be run on the same node.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS

PartitionerService Reservation

Service

Learner Serv.Learner Serv.Learner Serv.

DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

ReservationService

NU NS

NA NC NZ

Page 21: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

21

SCIENTIFIC OBJECTIVES

On node Nz, a combiner/tester service is created to perform the combining and testingprocesses and generate the global classifier GC.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS

PartitionerService Reservation

Service

Learner Serv.Learner Serv.Learner Serv.

DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

ReservationService

Combiner Service

NU NS

NA NC NZ

Page 22: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

22

SCIENTIFIC OBJECTIVES

The EPMS invokes the DAS service on node Nz, requesting to transfer the generated global classifier to node Nu.

KDD APPLICATION EXECUTION

StorageReservation

FactoryR

User Application

DASTAAS

EPMS

R

DAS

DatabaseService

R

PartitionerFactory

DAS

Resource Reservation

Factory

R

LearnerFactory

TAAS

PartitionerService Reservation

Service

Learner Serv.Learner Serv.Learner Serv.

DAS

Resource Reservation

Factory

R

Combiner Factory

TAAS

ReservationService

Combiner Service

NU NS

NA NC NZ

Page 23: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

23

SCIENTIFIC OBJECTIVES

Data privacy and security

KDD process state management

Complex processing patterns (Web Services are too simple to express distributed data mining processes and applications)

KDD Grid Service standards ( towards OGSA-KDAI ?)

KDD processes as G-Services Workflows

Asynchronous services

……

OPEN ISSUES FUTURE

Page 24: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

24

SCIENTIFIC OBJECTIVES

The knowledge-building process in a distributed setting involves data and information collection, generation, and distribution followed by the collective interpretation of processed information into “knowledge.”

Next-generation Grids must be able to produce, use, and deploy knowledge as a basic element of advanced applications.

Knowledge-based Grids that can offer tools, components and services to support data analysis, inference, and discovery in scientific and business applications.

OGSA-based services for distributed knowledge discovery are a key element for large support of e-science and e-business.

CONCLUSIONS

Page 25: Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY talia@deis.unical.it

25

CREDITS:

M. CannataroC. Comito

THANKS

www.icar.cnr.it/kgridwww.icar.cnr.it/kgrid