Upload
tommy96
View
125
Download
5
Embed Size (px)
DESCRIPTION
Citation preview
SQL Server 2005New Features
&Business Intelligence
Kleanthis Georgaris
Technology Specialist
Microsoft Hellas
SQL Server 2005A Complete Enterprise Data Management and BI Solution
Analysis ServicesAnalysis ServicesOLAP & Data MiningOLAP & Data Mining
Data TransformationData TransformationServicesServices
ETLETL
SQL ServerSQL ServerRelational EngineRelational Engine
Reporting ServicesReporting Services Managem
ent ToolsM
anagement ToolsD
evel
opm
ent T
ools
Dev
elop
men
t Too
ls
Agenda
n XML Support in SQL Server 2005
n .NET Inside the Database
n A step towards Object Oriented Programming
n User Defined Types
n Business Intelligence
n OLAP
n Data Mining
Agenda
n XML Support in SQL Server 2005
n .NET Inside the Database
n A step towards Object Oriented Programming
n User Defined Types
n Business Intelligence
n OLAP
n Data Mining
Data Representations
n Data can be represented in two waysn Relational (Databases) : Requires Infrastructuren Structured (XML): It’s simply text
n Data are exchange in XML Format but stored in Relationaln We need convergence of the two modelsn Three alternatives
n XML can be stored as textn loses much of value of XML representation
n XML can decomposed into multiple relational tablesn allows use of relational technologies
n XML can be stored as an xml data typen allows use of XML technologies
Mapping Data Models
n Sometimes you need to mix data modelsn middle-tier processing done with XML toolsn web service requires message content in xmln browser requires xml for client side processing
n But you have relational datan most data is stored using the relational model
database
37 Joe D Inc.41 May A Co.14 Sam H Inc.58 Bev K Inc.
company table
id name company<organization><title sn="37" org="D Inc."/><title sn="41" org="A Co."/><title sn="14" org="H Inc."/>
...</organization>
content & identifiers mapped
XML required for message
XML as a data type
n The XML data type is native database typen used as type of column in tablen used as type of parameter in stored proceduren used as type of return value of a user-defined
functionn used as type of a variable
XML data type - Example
CREATE TABLE xml_tab (the_id INTEGER, xml_col XML)
GO
-- auto conversionINSERT INTO xml_tab VALUES(1, '<doc/>')INSERT INTO xml_tab VALUES(2, N'<doc/>')
SELECT CAST(xml_col AS VARCHAR(MAX))FROM xml_tab WHERE the_id < 10
-- fails, not wellformedINSERT INTO xml_tab
VALUES(3, '<doc><x1><x2></x1></x2></doc>')
XML column usage
n XML column is not just a TEXT columnn XML technologies supported
n the contents can validated using XML Scheman XML-aware indexes are supportedn XQuery and XPath 2.0 supportedn in-database XML-related functionality works on
the typen FOR XMLn OpenXML
XML Demo
Agenda
n XML Support in SQL Server 2005
n .NET Inside the Database
n Business Intelligence
n OLAP
n Data Mining
Hosted CLR
n .Net CLR hosted inside SQL Server to improve performancen applications run in same address space as SQL Servern stored procedures in any language supported by CLRn web services can run inside of SQL Server
user .Netcode
T-SQLfunction
database
SQL Server Process
.NET and Visual Studio IntegrationBreakthrough in Developer Productivity
n Choice of programming language n T-SQL for data-intensive functions and proceduresn .NET languages for CPU-intensive functions and procedures
n Choice of where to run logicn Database or mid-tiern Symmetric data access model – ADO.NET
n Integrated debugging experience across mid-tier and database tiern Seamlessly step cross-language – TSQL and .NETn Set breakpoints anywhere, inspect anything
n Flexible and extensiblen Users defined functions, procedures, triggersn User defined types and aggregatesn XML data type
Development Environment
n New SQL Server Project template in VS 2005 for SQL Server 2005 managed code
n Server debug integrationn Full debugger visibilityn Set breakpoints anywhere
n Single step support n Between languagesn Between deployment
tiersn Auto-deployment
n Attributes
VS .NET VS .NET ProjectProject
Assembly: “TaxLib.dll”
VB,C#,C++VB,C#,C++ BuildBuild
SQL ServerSQL Server
SQL Data Definition:SQL Data Definition:create assembly …create function …create procedure …create trigger …create type …
SQL Queries: SQL Queries: select sum(tax(sal,statetax(sal,state)))from Empwhere county = ‘King’
Runtime hosted by SQL
(in-proc)
The Developer Experience
n Native SOAP accessn Standards based access to SQL
Servern No client dependencyn Improved Interoperability
n New “ENDPOINT AS HTTP” objectn Configure connection infon Configure authenticationn Expose Functions & SPsn Expose TSQL Batches
http://server1/aspnet/default.aspxhttp://server1/aspnet/default.aspx
http://server1/sql/pubs?wsdlhttp://server1/sql/pubs?wsdl
KernelKernelModeMode
ListenerListener
SQL Web Services
Why user-defined types?
n Add scalars that extend the type systemn used in sorts, aggregatesn customized sort orders and arithmetic calculations
n Allows scalars to be implemented efficientlyn compact representationn operations written in compiled language
UDTs on the client
n SQL Server UDTs are "normal" .NET classesn can be used in clients as
n parametersn DataReader column values
n Methods can be used on the client or servern Code can be
n locally available to clientsn stored in GAC
Using UDTs with T-SQL
n Using UDTs through Transact-SQL involves nothing new/* assuming a UDT called Point has m_x and m_y properties
CREATE TABLE point_tab( oid integer, point_col POINT)*/SqlConnection conn = new SqlConnection("my connect string");SqlCommand cmd = new SqlCommand();cmd.Connection = conn;conn.Open();cmd.CommandText = "insert into point_tab values(1, convert(Point, '10:10');
int i;i = cmd.ExecuteNonQuery();cmd.CommandText = "update point_tab
set point_col::m_x = 15where oid = 1";
i = cmd.ExecuteNonQuery();
UDTs and procedural code-- TSQL ProcedureCREATE PROCEDURE GetPoints (@a PointCls)AS SELECT thepoint::m_x, thepoint::m_y FROM point_tabWHERE thepoint::m_x > @a::m_xGO
DECLARE @p PointCls SET @p = CONVERT(PointCls, '1:1')EXEC GetPoints @p
-- .NET functionCREATE FUNCTION AddPoints (@a PointCls, @b PointCls)
RETURNS PointClsEXTERNAL NAME Point:PointCls::AddPointsGO
DECLARE @a PointCls, @b PointCls, @c PointClsSET @a = CONVERT(PointCls, '100:200')SET @b = CONVERT(PointCls, '3:4')SET @c = dbo.AddPoints(@a, @b)SELECT @c::m_x
Agenda
n XML Support in SQL Server 2005
n .NET Inside the Database
n Business Intelligence
n OLAP
n Data Mining
What is Data Warehouse?
n Defined in many different ways, but not rigorously.
n A decision support database that is maintained separately from
the organization’s operational database
n Support information processing by providing a solid platform of
consolidated, historical data for analysis.
n “A data warehouse is a subject-oriented, integrated, time-variant,
and nonvolatile collection of data in support of management’s
decision-making process.”—W. H. Inmon
n Data warehousing:
n The process of constructing and using data warehouses
Data Warehouse—Subject-Oriented
n Organized around major subjects, such as customer,
product, sales.
n Focusing on the modeling and analysis of data for decision
makers, not on daily operations or transaction processing.
n Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision
support process.
Data Warehouse—Integrated
n Constructed by integrating multiple, heterogeneous data sourcesn relational databases, flat files, on-line transaction
recordsn Data cleaning and data integration techniques are
applied.n Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different data sourcesn E.g., Hotel price: currency, tax, breakfast covered, etc.
n When data is moved to the warehouse, it is converted.
Data Warehouse—Time Variant
n The time horizon for the data warehouse is significantly longer than that of operational systems.
n Operational database: current value data.
n Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years)
n Every key structure in the data warehouse
n Contains an element of time, explicitly or implicitly
n But the key of operational data may or may not contain “time element”.
Data Warehouse—Non-Volatile
n A physically separate store of data transformed from the
operational environment.
n Operational update of data does not occur in the data
warehouse environment.
n Does not require transaction processing, recovery, and concurrency control mechanisms
n Requires only two operations in data accessing:
n initial loading of data and access of data.
OLTP vs. OLAP
OLTP OLAP users clerk, IT professional knowledge worker function day to day operations decision support DB design application-oriented subject-oriented data current, up-to-date
detailed, flat relational isolated
historical, summarized, multidimensional integrated, consolidated
usage repetitive ad-hoc access read/write
index/hash on prim. key lots of scans
unit of work short, simple transaction complex query # records accessed tens millions #users thousands hundreds DB size 100MB-GB 100GB-TB metric transaction throughput query throughput, response
Conceptual Modeling of Data Warehouses
n Modeling data warehouses: dimensions & measures
n Star schema: A fact table in the middle connected to a set of dimension tables
n Snowflake schema: A refinement of star schema
where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape
similar to snowflake
n Fact constellations: Multiple fact tables share
dimension tables, viewed as a collection of stars,
therefore called galaxy schema or fact constellation
Example of Star Schema
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcitystate_or_provincecountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Example of Snowflake Schema
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcity_key
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_sales
Measures
item_keyitem_namebrandtypesupplier_key
item
branch_keybranch_namebranch_type
branch
supplier_keysupplier_type
supplier
city_keycitystate_or_provincecountry
city
Example of Fact Constellation
time_keydayday_of_the_weekmonthquarteryear
time
location_keystreetcityprovince_or_statecountry
location
Sales Fact Table
time_key
item_key
branch_key
location_key
units_sold
dollars_sold
avg_salesMeasures
item_keyitem_namebrandtypesupplier_type
item
branch_keybranch_namebranch_type
branch
Shipping Fact Table
time_key
item_key
shipper_key
from_location
to_location
dollars_cost
units_shipped
shipper_keyshipper_namelocation_keyshipper_type
shipper
Multidimensional Data
n Sales volume as a function of product, month, and region
Prod
uct
Region
Month
Dimensions: Product, Location, TimeHierarchical summarization paths
Industry Region Year
Category Country Quarter
Product City Month Week
Office Day
A Concept Hierarchy: Dimension (location)
all
Europe North_America
MexicoCanadaSpainGermany
Vancouver
M. WindL. Chan
...
......
... ...
...
all
region
office
country
TorontoFrankfurtcity
A Sample Data Cube
Total annual salesof TV in U.S.A.Date
Produ
ct
Cou
ntrysum
sumTV
VCRPC
1Qtr 2Qtr 3Qtr 4QtrU.S.A
Canada
Mexico
sum
OLAP Server Architectures
n Relational OLAP (ROLAP)n Use relational or extended-relational DBMS to store and manage
warehouse data and OLAP middle ware to support missing piecesn Include optimization of DBMS backend, implementation of
aggregation navigation logic, and additional tools and servicesn greater scalability
n Multidimensional OLAP (MOLAP)n Array-based multidimensional storage engine (sparse matrix
techniques)n fast indexing to pre-computed summarized data
n Hybrid OLAP (HOLAP)n User flexibility, e.g., low level: relational, high-level: array
n Specialized SQL serversn specialized support for SQL queries over star/snowflake schemas
Data Warehouse Usage
n Three kinds of data warehouse applications
n Information processing
n supports querying, basic statistical analysis, and reporting using crosstabs, tables, charts and graphs
n Analytical processing
n multidimensional analysis of data warehouse data
n supports basic OLAP operations, slice-dice, drilling, pivoting
n Data mining
n knowledge discovery from hidden patterns
n supports associations, constructing analytical models, performing classification and prediction, and presenting the mining results using visualization tools.
n Differences among the three tasks
IT for the Past, Present and Future
n Archiving the Past – storage, writing, etcn Awareness of the Present – networking, telecom, etcn Predicting the Future – This is where the action is!n What is needed?
n Data about the past and presentn Models for how systems evolven Ability to associate data with system modelsn Predict the future and develop a course of action
n Let’s enumerate some applications…..
Necessity Is the Mother of Invention
n Data explosion problem
n Automated data collection tools and mature database technology
lead to tremendous amounts of data accumulated and/or to be
analyzed in databases, data warehouses, and other information
repositories
n We are drowning in data, but starving for knowledge!
n Solution: Data warehousing and data mining
n Data warehousing and on-line analytical processing
n Mining interesting knowledge (rules, regularities, patterns,
constraints) from data in large databases
What Is Data Mining?
n Data mining (knowledge discovery from data)
n Extraction of interesting (non-trivial, implicit,
previously unknown and potentially useful) patterns or knowledge from huge amount of data
n Data mining: a misnomer?
n Alternative names
n Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.
Data Mining Process
n Data mining—core of knowledge discovery process
Data Cleaning
Data Integration
Databases
Data Warehouse
Task-relevant Data
Selection
Data Mining
Pattern Evaluation
Complete Set of Algorithms
Decision TreesDecision Trees ClusteringClustering Time SeriesTime Series
Sequence Sequence ClusteringClustering
AssociationAssociation NaNaïïve Bayesve Bayes
Neural NetNeural Net
Introduced in SQL Server 2000
What Is Association Mining?
n Association rule mining:n Finding frequent patterns, associations, correlations, or
causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.
n Frequent pattern: pattern (set of items, sequence, etc.) that occurs frequently in a database [AIS93]
n Motivation: finding regularities in datan What products were often purchased together? — Beer
and diapers?!n What are the subsequent purchases after buying a PC?n What kinds of DNA are sensitive to this new drug?n Can we automatically classify web documents?
n Classification:n predicts categorical class labels (discrete or nominal)n classifies data (constructs a model) based on the
training set and the values (class labels) in a classifying attribute and uses it in classifying new data
n Prediction: n models continuous-valued functions, i.e., predicts
unknown or missing values n Typical Applications
n credit approvaln target marketingn medical diagnosisn treatment effectiveness analysis
Classification vs. Prediction
Classification Process (1): Model Construction
TrainingData
NAME RANK YEARS TENUREDMike Assistant Prof 3 noMary Assistant Prof 7 yesBill Professor 2 yesJim Associate Prof 7 yesDave Assistant Prof 6 noAnne Associate Prof 3 no
ClassificationAlgorithms
IF rank = ‘professor’OR years > 6THEN tenured = ‘yes’
Classifier(Model)
Classification Process (2): Use the Model in Prediction
Classifier
TestingData
NAME RANK YEARS TENUREDTom Assistant Prof 2 noMerlisa Associate Prof 7 noGeorge Professor 5 yesJoseph Assistant Prof 7 yes
Unseen Data
(Jeff, Professor, 4)
Tenured?
Training Dataset
age income student credit_rating buys_computer<=30 high no fair no<=30 high no excellent no31…40 high no fair yes>40 medium no fair yes>40 low yes fair yes>40 low yes excellent no31…40 low yes excellent yes<=30 medium no fair no<=30 low yes fair yes>40 medium yes fair yes<=30 medium yes excellent yes31…40 medium no excellent yes31…40 high yes fair yes>40 medium no excellent no
This follows an example from Quinlan’s ID3
Output: A Decision Tree for “buys_computer”
age?
overcast
student? credit rating?
no yes fairexcellent
<=30 >40
no noyes yes
yes
30..40