Upload
tambre
View
39
Download
1
Embed Size (px)
DESCRIPTION
GridMiner A Framework for Knowledge Discovery on the Grid – from a Vision to Design and Implementation. Peter Brezany, Ivan Janciak, Alexander W ö hrer, A Min Toja University of Vienna Institute for Software Science email: [email protected]. GridMiner Overview. Start: Jan. 2003 - PowerPoint PPT Presentation
Citation preview
www.gridminer.org … Intelligent Grid Solutions
GridMiner A Framework for Knowledge Discoveryon the Grid – from a Vision to Design and Implementation
Peter Brezany, Ivan Janciak, Alexander Wöhrer, A Min Toja
University of ViennaInstitute for Software Science
email: [email protected]
CGW'04, 13. Dec. 04 2
www.gridminer.org
GridMiner Overview Start: Jan. 2003
Host:
University of Vienna Vienna University of Technology
Target: provide tools to discover and access relevant knowledge and information
from different distributed and heterogeneous data sources
Test application area: medical traumatic brain injury treatment Predicting the outcome of seriously ill patients analytical part focuses on data mining and On-Line Analytical Processing
(OLAP)
CGW'04, 13. Dec. 04 3
www.gridminer.org
Project membersProject leader:Prof. A Min Tjoa, Vienna University of Technology Prof. Peter Brezany, University of Vienna
Visualization:Radoslav Ivanov
Data streaming:Nguyen Manh Tho
OLAP:Bernhard FiserUmut Onan Ibrahim Elsayed
Data mediation:Alexander Wöhrer
Knowledge Mgt:Ivan Janciak
Job Control:Günter Kickinger
Sequence Rules:Michael Rinner
Clustering:Markus Mayer
Decision rules:Christian KlonerJuergen Hofer
GUI:Paul Panhofer
Autonomic aspects:Michael Bergmann
CGW'04, 13. Dec. 04 4
www.gridminer.org
Outline Motivation/ Requirements GridMiner Services Architecture Dynamic Service Composition Engine OLAP Knowledge base Data Integration Graphical user interface Implementation Summary
CGW'04, 13. Dec. 04 5
www.gridminer.org
The process to cover Data distributed over
participating hospitals accesses from
different platforms (hand held, PC,…) for data generation, querying, analysis
Process needs to access various data sources
CGW'04, 13. Dec. 04 6
www.gridminer.org
GridMiner Motivation
integrate knowledge discovery and knowledge management as an autonomic system
manage and control whole lifecycle of knowledge give a strong support to other intelligent entities in their
needs for knowledge
Basic Requirements Ability to access and analyze a huge amount of information –
typically heterogeneous and geographically distributed Intelligent behavior ability to maintain, discover, extend, present
and communicate knowledge High performance (real-time or soft real-time) query processing High security guarantee
CGW'04, 13. Dec. 04 7
www.gridminer.org
GridMiner Services Dynamic Workflow Control Service Data mining services
Sequences (SPADE) Clustering (SimpleKMeans) Decision rules (SPRINT)
OLAP (sequential/parallel version) Association rules on OLAP
Grid Data Mediator Service
CGW'04, 13. Dec. 04 8
www.gridminer.org
GridMiner Architecture
Graphical User Interface
Knowledge Base Service configuration
Dynamic service control engine (DSCE)
Data Access and Integration Data mining services
Gri
dW
eb
Use
r envir
onm
ent
DSCE Client
CGW'04, 13. Dec. 04 9
www.gridminer.org
Dynamic Service Control Engine
Process a workflow described by DSCL. Based on the Open Grid Services Architecture Supports both interactive and batch processing User independent processing of the workflow Provision of all intermediate results from the involved services Full user control during workflow execution Supports the OGSA Notification Model
CGW'04, 13. Dec. 04 11
www.gridminer.org
Knowledge Base
Metadata
Dom
ain
Onto
log
y
Act
ivit
y O
nto
logy
Data
min
ing O
nt.
Data
tsourc
e O
nt.
Rules
Facts
XML ,XML Schema (XSL) (webrowset,pmml…)
Web Ontology Language OWL+ OWL-S
SWRL
OWL
CGW'04, 13. Dec. 04 12
www.gridminer.org
OLAP
Multidimensional data analysis by sequential and distributed / parallel OLAP engines.
Cube construction and querying Representation of query results by
OLAP Modeling Markup Language Integration with data mining engines
(Association rules on OLAP)
CGW'04, 13. Dec. 04 13
www.gridminer.org
Grid Data Mediation ServicePrinciples
Tight Federation: global (relational) schema
Virtual integration: let the data where it is always up-to-date data
No proprietary solution inherit well solve aspects from OGSA-DAI
Not bound to special architecture Supported data sources:
RDBMS (via JDBC), XMLDB (Xindice), CSV files Operators: “Union all” and “inner join” Operators are XQuery based (using SAXON)
CGW'04, 13. Dec. 04 14
www.gridminer.org
Data Integration Scenario
Heterogeneities: Name in A is „First Last“ (as the target format) Name in C has to be combined
Distribution: 3 data sources
CGW'04, 13. Dec. 04 15
www.gridminer.org
Data Integration Scenario (cont.) Query:
SELECT p_name FROM patient WHERE id=10
to
Standard
optimized
CGW'04, 13. Dec. 04 16
www.gridminer.org
Implementation/Technology Globus 3.2 OGSA/DAI GUI – Workflow constructions/Results
visualization (JGraph, Java web Start, Java server pages)
Service Configuration (Java server pages/PHP/..)
Knowledge base – (XML,OWL)
CGW'04, 13. Dec. 04 17
www.gridminer.org
Data mining Scenario
Database(100k rows)
(Select 10k rows)
Decision Rules (SPRINT)
Decision Rules (C45)
(Select 20k rows)Decision Rules
(C45)
CGW'04, 13. Dec. 04 19
www.gridminer.org
Summary Integrated data mining infrastructure
Covers the whole process Service Oriented Architecture Implemented Prototype
Project ongoing New data mining tasks (algorithms) Knowledge management
More information:http://www.gridminer.org