20
www.gridminer.org … Intelligent Grid Solutions GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja University of Vienna Institute for Software Science email: [email protected]

Peter Brezany, Ivan Janciak, Alexander W ö hrer, A Min Toja University of Vienna

  • Upload
    tambre

  • View
    39

  • Download
    1

Embed Size (px)

DESCRIPTION

GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation. Peter Brezany, Ivan Janciak, Alexander W ö hrer, A Min Toja University of Vienna Institute for Software Science email: [email protected]. GridMiner Overview. Start: Jan. 2003 - PowerPoint PPT Presentation

Citation preview

www.gridminer.org … Intelligent Grid Solutions

GridMiner A Framework for Knowledge Discoveryon the Grid – from a Vision to Design and Implementation

Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja

University of ViennaInstitute for Software Science

email: [email protected]

CGW'04, 13. Dec. 04 2

www.gridminer.org

GridMiner Overview Start: Jan. 2003

Host:

University of Vienna Vienna University of Technology

Target: provide tools to discover and access relevant knowledge and information

from different distributed and heterogeneous data sources

Test application area: medical traumatic brain injury treatment Predicting the outcome of seriously ill patients analytical part focuses on data mining and On-Line Analytical Processing

(OLAP)

CGW'04, 13. Dec. 04 3

www.gridminer.org

Project membersProject leader:Prof. A Min Tjoa, Vienna University of Technology Prof. Peter Brezany, University of Vienna

Visualization:Radoslav Ivanov

Data streaming:Nguyen Manh Tho

OLAP:Bernhard FiserUmut Onan Ibrahim Elsayed

Data mediation:Alexander Wöhrer

Knowledge Mgt:Ivan Janciak

Job Control:Günter Kickinger

Sequence Rules:Michael Rinner

Clustering:Markus Mayer

Decision rules:Christian KlonerJuergen Hofer

GUI:Paul Panhofer

Autonomic aspects:Michael Bergmann

CGW'04, 13. Dec. 04 4

www.gridminer.org

Outline Motivation/ Requirements GridMiner Services Architecture Dynamic Service Composition Engine OLAP Knowledge base Data Integration Graphical user interface Implementation Summary

CGW'04, 13. Dec. 04 5

www.gridminer.org

The process to cover Data distributed over

participating hospitals accesses from

different platforms (hand held, PC,…) for data generation, querying, analysis

Process needs to access various data sources

CGW'04, 13. Dec. 04 6

www.gridminer.org

GridMiner Motivation

integrate knowledge discovery and knowledge management as an autonomic system

manage and control whole lifecycle of knowledge give a strong support to other intelligent entities in their

needs for knowledge

Basic Requirements Ability to access and analyze a huge amount of information –

typically heterogeneous and geographically distributed Intelligent behavior ability to maintain, discover, extend, present

and communicate knowledge High performance (real-time or soft real-time) query processing High security guarantee

CGW'04, 13. Dec. 04 7

www.gridminer.org

GridMiner Services Dynamic Workflow Control Service Data mining services

Sequences (SPADE) Clustering (SimpleKMeans) Decision rules (SPRINT)

OLAP (sequential/parallel version) Association rules on OLAP

Grid Data Mediator Service

CGW'04, 13. Dec. 04 8

www.gridminer.org

GridMiner Architecture

Graphical User Interface

Knowledge Base Service configuration

Dynamic service control engine (DSCE)

Data Access and Integration Data mining services

Gri

dW

eb

Use

r envir

onm

ent

DSCE Client

CGW'04, 13. Dec. 04 9

www.gridminer.org

Dynamic Service Control Engine

Process a workflow described by DSCL. Based on the Open Grid Services Architecture Supports both interactive and batch processing User independent processing of the workflow Provision of all intermediate results from the involved services Full user control during workflow execution Supports the OGSA Notification Model

CGW'04, 13. Dec. 04 10

www.gridminer.org

Dynamic Service Control Engine (cont.)

CGW'04, 13. Dec. 04 11

www.gridminer.org

Knowledge Base

Metadata

Dom

ain

Onto

log

y

Act

ivit

y O

nto

logy

Data

min

ing O

nt.

Data

tsourc

e O

nt.

Rules

Facts

XML ,XML Schema (XSL) (webrowset,pmml…)

Web Ontology Language OWL+ OWL-S

SWRL

OWL

CGW'04, 13. Dec. 04 12

www.gridminer.org

OLAP

Multidimensional data analysis by sequential and distributed / parallel OLAP engines.

Cube construction and querying Representation of query results by

OLAP Modeling Markup Language Integration with data mining engines

(Association rules on OLAP)

CGW'04, 13. Dec. 04 13

www.gridminer.org

Grid Data Mediation ServicePrinciples

Tight Federation: global (relational) schema

Virtual integration: let the data where it is always up-to-date data

No proprietary solution inherit well solve aspects from OGSA-DAI

Not bound to special architecture Supported data sources:

RDBMS (via JDBC), XMLDB (Xindice), CSV files Operators: “Union all” and “inner join” Operators are XQuery based (using SAXON)

CGW'04, 13. Dec. 04 14

www.gridminer.org

Data Integration Scenario

Heterogeneities: Name in A is „First Last“ (as the target format) Name in C has to be combined

Distribution: 3 data sources

CGW'04, 13. Dec. 04 15

www.gridminer.org

Data Integration Scenario (cont.) Query:

SELECT p_name FROM patient WHERE id=10

to

Standard

optimized

CGW'04, 13. Dec. 04 16

www.gridminer.org

Implementation/Technology Globus 3.2 OGSA/DAI GUI – Workflow constructions/Results

visualization (JGraph, Java web Start, Java server pages)

Service Configuration (Java server pages/PHP/..)

Knowledge base – (XML,OWL)

CGW'04, 13. Dec. 04 17

www.gridminer.org

Data mining Scenario

Database(100k rows)

(Select 10k rows)

Decision Rules (SPRINT)

Decision Rules (C45)

(Select 20k rows)Decision Rules

(C45)

CGW'04, 13. Dec. 04 18

www.gridminer.org

Graphical User Interface

CGW'04, 13. Dec. 04 19

www.gridminer.org

Summary Integrated data mining infrastructure

Covers the whole process Service Oriented Architecture Implemented Prototype

Project ongoing New data mining tasks (algorithms) Knowledge management

More information:http://www.gridminer.org

CGW'04, 13. Dec. 04 20

www.gridminer.org

Thank you

Questions?