ICTP-Sir jamia millia

  • Upload
    imad

  • View
    242

  • Download
    0

Embed Size (px)

Citation preview

  • 8/14/2019 ICTP-Sir jamia millia

    1/29

    1Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    WEB BASED DATA MININGWEB BASED DATA MININGSYSTEM FOR HEALTHCARESYSTEM FOR HEALTHCARE

    Harleen Kaur Jamia Millia Islamia, New Delhi, India.

    Email- [email protected] Tel No- +91-9891174111

    Special Thanks to Prof Siri Krishan Wasan (Jamia Millia Islamia, Deptt. of Mathematics, New Delhi, India) and

    Dr Vasudha Bhatnagar (Deptt. of Computer Science, University of Delhi, New Delhi, India).

  • 8/14/2019 ICTP-Sir jamia millia

    2/29

    2Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Outline

    Why and What is Data Mining

    Data Mining and Healthcare

    Web Databases

    Proposed System

    Summary

    References

  • 8/14/2019 ICTP-Sir jamia millia

    3/29

    3Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    DBMS and Data Mining

    A Database Management System (DBMS) is a software packagedesigned to store and manage databases

    A very large, integrated collection of data DBs data are retrieved as stored DBs results are subset of data Models real-world enterprise

    Entities Relationships

    Extraction of interesting (non-trivial, implicit, previously unknown andpotentially useful) patterns or knowledge from huge amount of data

    As databases grow larger, decision-making from the data is not possible;need knowledge derived from the stored data

    Data Mining data need to be cleaned (some what) before producing theresults

    Data Mining results are the analysis of the data

  • 8/14/2019 ICTP-Sir jamia millia

    4/29

    4Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Why data mining?

    Large volume of data (voluminous)

    Dimensionality of data

    High data growth rateThere is need to discover Valid, Useful, Structural, Understandable

    patterns

    Alternative names:

    Knowledge discovery in databases (KDD),

    Knowledge extraction,Data/pattern analysis

    Process of discovering knowledge/patterns in data

  • 8/14/2019 ICTP-Sir jamia millia

    5/29

    5Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Knowledge Discovery in Databases

    (KDD) and Data MiningKnowledge Discovery in Databases

    Knowledge Discovery in Databases is the nontrivial process of identifying valid, novel, potentiallyuseful, and ultimately understandable patterns in

    data : Fayyad (1996). Process of Searching trends and Valuable

    anomalies in large datasets

    Data Mining

    Data Mining is the non-trivial extraction of implicit

    previously unknown & potential usefulinformation about data

    Core step results in the discovery of knowledge

  • 8/14/2019 ICTP-Sir jamia millia

    6/29

    6Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Knowledge discovery process

    Knowledge discovery in databases (KDD) process - Data selection: Identify target datasets and

    relevant fields

    - Data cleaning and transformation(preprocessing) - Remove noise and outliers - Create common units (common data repository from all sources) - Generate new fields - Data mining model construction - Model evaluation and visualization for the generated results

  • 8/14/2019 ICTP-Sir jamia millia

    7/29

    7Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    adapted from:Usama M. Fayyad, et al. (1996), From Data Mining toKnowledge Discovery : An Overview, Advances in KnowledgeDiscovery and Data Mining, U. Fayyad et al. (Eds.), AAAI/MIT

    Press

    Data TargetData

    Selection

    Knowledge

    Knowledge

    Preprocessed/ TransformedData

    Patterns

    DataMining

    ModelEvaluatio

    n

    Knowledge Discovery inDatabases: Process

    Preprocessingand Transformation

  • 8/14/2019 ICTP-Sir jamia millia

    8/29

    8Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Data Mining: Types of Data

    - Relational data- Spatial / Temporal data

    - Numeric data - Categorical data

    - Time-series data

    - Text- Images/ Video/ Multimedia- Web data

  • 8/14/2019 ICTP-Sir jamia millia

    9/29

    9Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Integration of Multiple Technologies

    MachineLearning

    DatabaseManagemen

    t

    ArtificialIntelligence

    Statistics

    DataMining

    Visualization

    Algorithms

    HighPerformanc

    ecomputing

  • 8/14/2019 ICTP-Sir jamia millia

    10/29

    10Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Applications adapted

    Retail marketing

    Telecommunication

    Banking

    Fraud analysis

    Bio-data mining

    Stock market analysis

    Web mining

  • 8/14/2019 ICTP-Sir jamia millia

    11/29

    11Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Data Mining task

    There are two main tasks of data mining:

    Predictive data mining

    Predicts future values, or unknown values Example Classification - rule induction, decision tree,

    neural networks, Bayesian networks, Regression,genetic algorithms, support vector machines

    Descriptive mining

    Produces the model that describes the observed data Such as Association rules, Clustering

  • 8/14/2019 ICTP-Sir jamia millia

    12/29

    12Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Data Mining Techniques

    Common Mining Techniques

    Classification

    Clustering Associations

    Others techniques are

    Sequential Patterns Regression

    Deviation Detection

  • 8/14/2019 ICTP-Sir jamia millia

    13/29

    13Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Classification

    Given a collection of record (training set)

    Each record contains a set ofattributes, one of theattributes is the class

    Find a model for class attributes as a function of the valuesof other attributes

    Goal : Previously unseen records should be assigned a classas accurately as possible

    A test set is used to determine the accuracy of the

    model. Usually, the given data set is divided intotraining and test sets, with training set used to buildthe model & test set used to validate it.

  • 8/14/2019 ICTP-Sir jamia millia

    14/29

    14Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Clustering

    Clustering is unsupervised

    Unlike classification, in clustering, no pre-classified data

    Search for groups or clusters of data points (records) that aresimilar to one another.

    Data points in cluster have high intra-cluster similarity and lowinter-cluster

    Distance is used as a measure of similarity

    Applications

    As a stand-alone tool to get insight into data distribution As a preprocessing step for other algorithms

  • 8/14/2019 ICTP-Sir jamia millia

    15/29

    15Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Association Mining

    Association rule mining:

    Finding frequent patterns, associations, correlations among sets ofitems or objects in transaction databases, relational databases, andother information repositories.

    Frequent pattern: pattern (set of items, sequence, etc.) that occurs

    frequently in a database

    Motivation: finding regularities in data

    What products were often purchased together?

    What are the subsequent purchases after buying a PC?

    What kinds of DNA are sensitive to this new drug?

    Broad applications

    Market basket data analysis, cross-marketing, catalog design, salecampaign analysis

    Web log analysis, DNA sequence analysis, etc.

  • 8/14/2019 ICTP-Sir jamia millia

    16/29

    16Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Commercial/Research DataMining tools

    Some commercial data mining tools are

    WEKA (The university of Waikato) http:// www.cs.waikato.ac.nz/ml/weka/

    Clementine (SPSS Inc Integral Solutions).

    http://www.spss.com/clementine/ Bayesialab (Bayesia SA )

    http://www.bayesia.com/ MineSet (Silicon Graphics Inc. - SGI)

    http://www.sgi.com/products/

    Intelligent Miner (IBM Corp.)

    http://www.ibm.com/legal/copytrade.shtml Web Analyst (Megaputer Intelligence Inc.)

    http://www.megaputer.com/products

    SurfAid Analysis (IBM Corp.) http:// www.nwc.com/

    v

    http://www.sgi.com/products/http://www.sgi.com/products/
  • 8/14/2019 ICTP-Sir jamia millia

    17/29

    17Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Need for mining healthcare data

    Extraction of knowledge for diagnostic, screening, prognostic,monitoring and overall patient management task

    Hospital Administration

    Strategic decision making

    Control cost

    Quality of service

    Reduce adverse drug events

    Analysis of epidemiological data

    Predicting patterns of disease Need to develop a system that can support the sharing and reuse of

    medical knowledge

  • 8/14/2019 ICTP-Sir jamia millia

    18/29

    18Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Disciplines of Healthcare system where dataDisciplines of Healthcare system where data

    mining tools can be appliedmining tools can be applied

    Data

    Mining

    Tools

    Treatment

    Hospital Informationsystem

    Clinical

    Modeling

    Medical

    Imaging

    Diagnosis

    DrugDevelopment

  • 8/14/2019 ICTP-Sir jamia millia

    19/29

    19Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Mining Issues in Healthcare dataIssues to be addressed

    Handling heterogeneous data

    Distributed data

    High dimensional data

    Visual data mining

    Privacy-preserving mining

  • 8/14/2019 ICTP-Sir jamia millia

    20/29

    20Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Mining of Web Databases Web is a collection of inter-related files on one or moreWeb servers. Application of Data Mining Techniques : Web Mining -Web content mining

    Process of extracting information discovery from onlinesources

    - Web usage mining Process of discovering/ mining structure information

    from user-browsing and access patterns Some web medical databases are

    TRIP Database, one of the Internet's leading medical resources. The TRIP Database allows users torapidly and easily identify high quality medical literature from a wide range of sources athttp://tripdatabase.com

    Ovid database provide with the information to tackle scientific or medical questions athttp://ovid.com

    MEDLINE on BioMedNetMEDLINE on Scirus is the search engine for science, at http://www.scirus.com

    Pharmacological Targets Database (PTBase)Pharmacological Targets Database (PTBase) is no longer available. MDL Elsevier has nowreleased xPharm: a fully interactive Pharmacological database, with 800% more target data thanany other online source at http://bmn.com

    http://ovid.com/http://www.scirus.com/http://www.scirus.com/http://ovid.com/
  • 8/14/2019 ICTP-Sir jamia millia

    21/29

    21Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Web Medical Databases

    Some on-line medical information include:

    CancerNet provides information about cancer, including state-of-the-art information on cancerscreening, prevention, treatment and supportive care, and summaries of clinical trials. (http://www.nci.nih.gov)

    CancerNet for Patients and the Public includes access to PDQ (Physician Data Query) andrelated information on treatments; detection, prevention and genetics information; supportivecare information; clinical trial information. (http://cancernet.nci.nih.gov/patient.htm)

    CancerLit a comprehensive archival file of more than one million bibliographic records (mostwith abstracts) describing 30 years of cancer research published in biomedical journals,

    proceedings of scientific meetings, books, technical reports, and other documents. (http://wwwicic.nci.nih.gov/ canlit/canlit.htm)

    CancerNet for Health Professionals includes access to PDQ and related information ontreatments, screening, prevention and genetics;supportive care and advocacy issues; clinicaltrials; a directory of genetic counselors. (http://wwwicic.nci.nih.gov/ health.htm)

    http://www.nci.nih.gov/http://cancernet.nci.nih.gov/patient.htmhttp://wwwicic.nci.nih.gov/http://wwwicic.nci.nih.gov/http://cancernet.nci.nih.gov/patient.htmhttp://www.nci.nih.gov/
  • 8/14/2019 ICTP-Sir jamia millia

    22/29

    22Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Challenges to Medical InformationSystems

    Integrating various medical data sources such as server accesslogs, referrer logs, patient registration or patient profileinformation

    Resolving difficulties in the diagnosis of diseases due to unique

    key attributes in the patient record which can easily bepredicted

    Predicting patient treatment

    Prescribing patient medication

    To help patients maintain their independence and maximum levelof function within their own homes and communities

    The goal is to educate patient in self-care and prolonged medicalmonitoring and supervision

  • 8/14/2019 ICTP-Sir jamia millia

    23/29

    23Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Proposed Web based Healthcaresystem

    Medical Tele-monitoring and Tele-care facilities

    System can easily expand to cover all the healthcarespectra

    Interface provides a friendly environment both for thepatient and for the physician

    Patient

    s

    Doctor Medical

    Staff

    KnowledgeRefereed ServerComponents of Medical System

  • 8/14/2019 ICTP-Sir jamia millia

    24/29

    24Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Features of the system

    Patients is able to send his/her medical problem via www bycompleting the online web forms

    The doctor is able to browse the data and check the patients

    Proposed system is user-friendly, cost-effective and Powerful tool

    Proposed medical system includes not only diagnosis, medicaltreatment but also prolonged medical monitoring andsupervision

    The goal of medical care is to control disease processes and tohelp patients maintain their maximum level of function withintheir own homes and communities

  • 8/14/2019 ICTP-Sir jamia millia

    25/29

    25Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Motivation

    The World Wide Web is the richest and most dense source ofinformation

    The Web/WAP portal is able to gather information from patients

    regardless of their location As the data in the database expand as result of the wide use of the

    portal, it becomes difficult to find information manually

    Data mining provides algorithms, which allow automatic patterndiscovery and interactive analysis

    The system can support the doctors effort by posting up alertswhenever a patients health is in a critical position.

  • 8/14/2019 ICTP-Sir jamia millia

    26/29

    26Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    Summary

    Web has been adopted as a critical communication andinformation medium by a majority of the population

    Web data is growing at a significant rate

    A number of new data mining concepts and techniques have beendeveloped using this concept

    Many successful applications exist

    Fertile area of research

    Privacy

  • 8/14/2019 ICTP-Sir jamia millia

    27/29

    27Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    References :References :

    1. Andreassfor interp

    Conferen

    CA Aug

  • 8/14/2019 ICTP-Sir jamia millia

    28/29

    28Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    References :References :

    8. Han, J. and M.

    Kauffmann Pu

    9 Lu H R Seti

  • 8/14/2019 ICTP-Sir jamia millia

    29/29

    29Harleen Kaur, Jamia Millia Islamia, New Delhi, India.

    In our move towards becoming a developed nation,In our move towards becoming a developed nation,to provide an honorable and comfortable life toto provide an honorable and comfortable life to

    IndiansIndians THANK YOU !!!THANK YOU !!!