Upload
aiswaryadevi-jaganmohan
View
497
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Data mining and data warehousing
Citation preview
year Evolution of data mining and warehousing
1960’s Data collection and database creation
1970’s Database Management systems
Mid 1980’s Advanced database systems
Late 1980’s Data warehousing and Data mining
1990’s Web Based Databases
2006 Information Systems
2013 Big data retrieval
Data Mining refers to extracting or “mining” knowledge from large amounts of data
Knowledge mining from data
Knowledge Extraction Data/Pattern analysis Data archaelogy Data Dredging Knowledge discovery from
data.
Knowledge Discovery Process:
Data cleaning Data integration Data selection Data transformation Data mining Pattern evaluation Knowledge presentation
Relational databases Data Warehouses Transactional Databases Object Relational Databases Temporal, Sequence and Time series
Databases Spatial and Spatio Temporal Databases Text and Multimedia Databases Heterogeneous and Legacy Databases Data Streams and WWW
1.Relational database
A set of variables A set of messages A set of methods
A temporal database typically stores relational data that include time-related attributes.
These attributes may involve several timestamps, each having different semantics.
A sequence database stores sequences of ordered events, with or without a concrete notion of time.
Examples include customer shopping sequences,Web click streams, and
biological sequences.
A time-series database stores sequences of values or events obtained over repeated measurements of time (e.g., hourly, daily, weekly).
Examples include data collected from the stock xchange, inventory control, and the observation of natural phenomena (like temperature and wind).
Data Warehouse A data warehouse is a subject-
oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process
geographic (map) databases, very large-scale integration (VLSI) or computed-
aided design databases, medical and satellite image databases. Spatial data may be represented in raster
format: n-dimensional bit maps or pixel maps.
For example, a 2-D satellite each pixel registers the rainfall in a givenarea.
Maps can be represented in vector format, where roads, bridges, buildings, and
lakes are represented as unions or overlays of basic geometric constructs, such as points,
lines, polygons, and the partitions and networks formed by these components.
A spatial database that stores spatial objects that change with time is called a
spatiotemporal database,e.g., Cricket Ball
Text databases are databases that contain word descriptions for objects.
Multimedia databases store image, audio, and video data.
A heterogeneous database consists of a set of interconnected, autonomous component databases.
A legacy database is a group of heterogeneous databases that combines different kinds of data systems, such as relational or object-oriented databases,hierarchical databases, network databases, spreadsheets, multimedia databases, or file systems.
data flow in and out of an observation platform (or window) dynamically is generated and analyzed.
Capturing user access patterns in such distributed information environments is called Web usage mining (or Weblog mining).
› Time Variant
The Warehouse data represent the flow of data through time. It can even contain projected data.
› Non-Volatile
Once data enter the Data Warehouse, they are never removed.
The Data Warehouse is always growing
Teradata Oracle SAP BW - Business Information
Warehouse (SAP Netweaver BI) Microsoft SQL Server IBM DB2 (Infosphere Warehouse) SAS
1984 — Metaphor Computer Systems, founded by David Liddle and Don Massaro, releases Data Interpretation System (DIS).
DIS was a hardware/software package and GUI for business users to create a database management and analytic system.
Survey (S): (2 Minutes)The students are asked to browse the
following titles and subtitles from the book.
Text Book:Han and Kamber, “Data Mining”, Second
Edition, Elsevier,2008. Page no:105-109 Page no : 2-21
1.Data Mining is otherwise called as a) Knowledge miningb) Knowledge mining from large datac) Data extractiond) None of the above2.In knowledge Discovery process,data mining is after which processa) Data transformationb) Data selectionc) Neither (a) nor (b)d) Both3. In which type of data warehouse, once the data enter the Data
Warehouse, they are never removed.a) Integrated b) Time-variantc) Subject orientedd) Non-Volatile
4. An object relational database consists of entities with
a) Variables b) Messagesc) Methods d) All the above5.Web usage mining is otherwise called as Weba) Web miningb) Web log miningc) None of the aboved) Both
Specify the seven steps in KDD process? Explain four categories of data
warehousing? Define heterogenous and legacy
database? What are the data mining task
primitives? What are the different kinds of data to
be mined?
A data warehouse maintains a copy of information from the source transaction systems. This architectural complexity provides the opportunity to:
Congregate data from multiple sources into a single database so a single query engine can be used to present data.
Mitigate the problem of database isolation level lock contention in transaction processing systems caused by attempts to run large, long running, analysis queries in transaction processing databases.
Maintain data history, even if the source transaction systems do not.
Integrate data from multiple source systems, enabling a central view across the enterprise. This benefit is always valuable, but particularly so when the organization has grown by merger.
Improve data quality, by providing consistent codes and descriptions, flagging or even fixing bad data.
Present the organization's information consistently. Provide a single common data model for all data of
interest regardless of the data's source. Restructure the data so that it makes sense to the
business users. Restructure the data so that it delivers excellent
query performance, even for complex analytic queries, without impacting the operational systems.
Add value to operational business applications, notably customer relationship management (CRM) systems.