Upload
lauren-knight
View
222
Download
0
Embed Size (px)
Citation preview
Oracle vs SQL Server
Dr. Alex Wang
Oracle Text
• Oracle Text uses standard SQL to do almost everything.
• Full-text retrieval technology, deal with unstructured data.
• Data source could be database table, flat files, web sites.
• Index, search, analyze text and documents.
• Searching: keyword searching, context query, pattern matching, thematic queries, HTML/XML section searching.
• Use relevance-ranking to improve search quality.
• Supported formats: PDF, MS Office, HTML, XML
Search Operators used in Oracle
Context search• Near - return a score based on the proximity of two or more terms.
Pattern search• Fuzzy - spelled similar. • Soundex - sound alike. • Stem - search for all terms with the same root.
Use thesaurus • Preferred Term - replace query term with prefered term define in a thesaurus.• Related Term - Expand to all related term defined in a thesaurus. • Synonym - Expand to all terms defined as synonyms. • Narrow Term - Expand to all terms defined as the narrower/lower level terms. • Broader Term - Expand to all terms defined as broader/higher level terms. • Top Term -
Search Operators used in SQL Server
• CONTAINS can search for: • A word near another word. • The prefix of a word or phrase. • Soundex Function (for search sound alike).• A word inflectionally generated from another (for
example, the word drive is the inflectional stem of drives, drove, driving, and driven).
• A word that is a synonym of another word using thesaurus (for example, the word metal can have synonyms such as aluminum and steel).
Feature Oracle Microsoft
Available in SE, EE EE
Decision Tree Y Y
Support Vector Machine Y N
Neural Network N Y
Naive Bayes Y Y
Adaptive Bayes Network Y N
K-means Y Y
Expectation Maximization N Y
Orthogonal Clustering Y N
Path cluster N Y
Minimal Descriptor Length Y N
Time Series Y Y
Association Rules Y Y
Note: Minimal Descriptor Length, identifies the relative importance of an attribute in predicting a given outcome.
Oracle emphasize PL/SQL statement
Simple Prediction Query
Question:
Select all customers who have a high propensity to attrite (> 80% chance)
SQL Query:
SELECT A.cust_name, A.contact_infoFROM customers AWHERE PREDICTION_PROBABILITY(tree_model, ‘attrite’ USING A.*) > 0.8
An Example of Oracle Text Mining
• Building a DT Models
• CREATE TABLE dt_settings (setting_name VARCHAR2(30),setting_value VARCHAR2(30));
BEGIN-- Populate settings tableINSERT INTO dt_sample_settings VALUES(dbms_data_mining.algo_name, dbms_data_mining.algo_decision_tree);COMMIT;
DBMS_DATA_MINING.CREATE_MODEL(model_name => 'sales_type_model',mining_function => dbms_data_mining.classification,data_table_name => 'sales_dataset',case_id_column_name => 'sales_id',target_column_name => 'sales_type',settings_table_name => 'dt_settings');END;
An Example of SQL Server Text Mining
• A Tutorial for Text Classification using SQL Server
2005 Beta2 Data Mining
• Peter Pyungchul Kim
• SQL Business Intelligence
• Microsoft Corporation
• http://www.sqlserverdatamining.com/dmcommunity/_tutorials/688.aspx
Data Source
• 5000 postings from 5 news groups
• We know which posting belong to which group
• Flat text file
• Goal: create a model based on these data to classify each posting to its group
• Randomly chose 70% for training, 30% for testing.
SQL Server
• You can do it by click through SQL Server GUI tools.
• 1. SQL Mgmt Studio - Create database, import the data
• 2. Business Intelligence Development Studio – Build a dictionary, term vectors.
• 3. Build/Test data mining models
Compare Classification Results