Upload
others
View
26
Download
0
Embed Size (px)
Citation preview
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Autonomous DatabaseWhat every DBA should know
Sandesh RaoVP - Autonomous Database Health & Machine Learning
1
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Safe Harbor StatementThe following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remains at the sole discretion of Oracle.
Confidential – Oracle Restricted
2
Copyright © 2016, Oracle and/or its affiliates. All rights reserved. |
Theme
1. Tools or features which provide some function2. Automation around some of these tools or features3. Components or products which use machine learning to solve some use-cases4. Additional ML tools which can be used on 1,2 or the results of 3 to develop different
outcomes 1. People who know DataScience2. People who want to use it – prebuilt models
Confidential –Oracle Restricted
3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Agenda
Journey to Autonomous Database
Machine learning basics & use cases
1
2
4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle’s Vision for Autonomous Database
• Self-Driving–User defines service levels, database makes them happen
• Self-Securing–Protection from both external attacks and malicious internal users
• Self-Repairing–Automated protection from all downtime
5
AutonomousDatabase
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle Database 9i, 10g• Automatic Storage Management (ASM)• Automatic Memory Management
• Automatic DB Diagnostic Monitor (ADDM)
• Automatic Workload Repository (AWR)
• Automatic Undo tablespaces
• Automatic Segment Space Management• Automatic Statistics Gathering
• Automatic Standby Management (Broker)
• Automatic Query Rewrite
Oracle Database 11g, 12c• Automatic SQL Tuning • Automatic Workload Replay
• Automatic Capture of SQL Monitor
• Automatic Data Optimization
• Automatic Storage Indexes
• Automatic Columnar Cache• Automatic Diagnostic Framework
• Automatic Refresh of Database Cloning
• Autonomous Health Framework
6
Journey to Autonomous Database• Oracle has been developing sophisticated database automation for decades
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Database Operations Runtime Management
• Solving these challenges requires a holistic approach– Prevent problems and optimize solutions in real-time– Recover from failures and identify root cause quickly with minimal intervention
• Human reactions too late and do not scale
• Manual triage and floods of notifications do not scale
• Applied Machine learning techniques effectively respond in real-time and without huge impact to operations
Confidential – Oracle Restricted 7
Prevention and Recovery Pillars
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Journey to Autonomous Database
• Cloud enables Oracle to deliver a Fully Autonomous Database – Expanded Database Automation– Integrated with complete infrastructure automation –With additional automation for operations, HA, security, etc.
8
AutonomousDatabase
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
One Autonomous Database – Optimized by Use Case
9
Oracle Autonomous Database
EnterpriseOLTP,Mixed
Workloads
Data Warehousing
Departments, Developers
2017 2018 Now
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Autonomous Database Cloud For Data Warehouse• Easy– Automatically optimizes Analytic workloads – Simply “load and go”– Database tunes itself - No need to define indexes, partitions, materialized views, etc.– Works with any BI analytics tool
• Fast– Based on Exadata technology– Performance matches or exceeds most hand-tuned Data Warehouses
• Elastic– Instant scaling of compute or storage with no downtime– Pay for compute when in use only
10
Expected CY 2017
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Autonomous Database Cloud For OLTP or Mixed Workloads
11
Expected CY 2018
• Easy
– Configured for Mission Critical workloads
• Full Maximum Availability Architecture with scale-out clustering and disaster recovery
–Or Configured for Low Cost • Single server for non-critical workloads or test/dev
• Fast
– Based on Exadata technology
• Elastic
– Instant scaling of compute or storage with no downtime
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Full End-to-End Automation
• Must automate a large number of tasks
– Setup and provision software using Gold images
– Provisioning scale-out clusters and disaster recovery automatically
– Switchovers and failovers with defined parameters
– Patching, upgrading, and backing up online using RAC, ASM and Clusterware
–Monitoring, scaling, diagnosing performance
– Tuning, optimizing and using new ATO features
– Testing and change management of complex applications and workloads
–Automatically handling failures and errors – log file lifecycle management
– Isolation and multitenant setup using Container Databases
– Infrastructure advantages like Containers (Docker , Kubernetes ) for app deployment
12
Autonomous
Database
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 13
Smart CollectionTrace File Analyzer
• Always on – Enabled by default• Has improved comprehensive first failure
diagnostics collection• Efficiently collects, packages and transfers
diagnostic data to Oracle Support• Reduces round trips between Customers
and Oracle• Transfers data to centralized storage for
detailed analysis with TFA Service• Supports Database 10.2 and above• Included since 11.2.0.4 and 12.1.0.2 and
updated in patchsets & PSUs
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 14
Autonomous Usage
Oracle Grid Infrastructure& Databases
Oracle Support
TFA
1TFA detects a fault
2Diagnosticsare collected
3Distributed diagnostics are consolidated and packaged
4Notification of fault is sent
5 Diagnostic collection is uploaded to Oracle Support for root cause analysis & resolution
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 15
Faster & Easier SR Data Collectiontfactl diagcollect –srdc <srdc_type> -sr <SR#>
Type of Problem SRDC
ORA Errors
• ORA-00020
• ORA-00060
• ORA-00600• ORA-00700
• ORA-01555• ORA-01628
• ORA-04030
• ORA-04031
• ORA-07445
• ORA-27300• ORA-27301
• ORA-27302• ORA-30036
Other internal database errors • internalerror
Database performance • dbperf
Database patching• dbpatchinstall
• dbpatchconflict
Database resource • dbunixresources
XDB installation or invalid object • dbxdb
Database install / upgrade• dbinstall
• dbupgrade
• dbpreupgrade
Type of Problem SRDCDatabase storage • asm
Excessive SYSAUX Space used by the Automatic
Workload Repository (AWR)
• dbawrspace
Database startup / shutdown • dbshutdown
• dbstartup
Data Guard • dbdataguard
Enterprise Manager tablespace usage metric • emtbsmetrics
Enterprise Manager general metrics page or
threshold problems - Run all three SRDCs
• emdebugon
• emdebugoff
• emmetricalert
Enterprise Manager target discovery / add
• emcliadd
• emclusdisc
• emdbsys• emgendisc
• emprocdisc
Enterprise Manager OMS restart • emrestartoms
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC 18c
• TFA Service is set up as part of DSC setup– Runs on first node of a cluster
• Web admin account is locked at start– To unlock:
• For general info
16
TFA Service
$ tfactl receiver infoTFA Service URL : http://mys66:7070/tfa/index.htmlTFA Service URL (https) : https://mys66:7071/tfa/index.htmlTFA Service Admin User : adminTFA Service Admin Status : activeTFA Service Repository : /scratch/app/oragrid/tfa/repositoryTFA Service Port : 7001TFA Service Members :
tfactl receiver info
tfactl receiver reset webadmin
Oracle Domain Services Cluster
IO ServiceACFS Services
ASM Service
TFAService
ManagementService
RHPService
Shared ASM
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Domain Services Cluster Installation steps
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
View cluster heat map to see potential issuesand drill intohost level
Oracle RAC 18c
18
TFA Service – Cluster / Host Health View
1
Choose between component health or utilization
5
View timeline of important
events4
View frequency of events 2
View recent TFA Collections 3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Oracle RAC 18c
19
TFA Service – Cluster / Host Utilization View
1
View heat map for utilization hotspots
2View utilization graphs
Hover on a section to see
more information
3
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Automatic proactive warning of problems before they
impact you
20
Get scheduled health reports sent to you in email
Why Oracle ORAchk & EXAchkHealth checks for most impactful
reoccurring problemsRuns in your environment
with no need to send anything to Oracle
Findings can be integrated into other tools of choiceEngineered
Systems
Non Engineered
Systems
EXAchk
Common Framework
ORAchk
Further slide details
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• New checks to help when upgrading the database to 12.2• Both pre and post upgrade verification to prevent
problems related to:• OS configuration• Grid Infrastructure & Database patch prerequisites
• Database configuration• Cluster configuration
Upgrade to Database 12.2 with confidence
orachk -u –o pre
orachk -u –o post
Pre upgrade
Post upgrade
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Autonomous Health via Machine Learning
• Real-time Health Monitoring of compliance, performance, availability & capacity
20142016
2018+
Journey to Autonomous Database Cloud
Confidential – Gartner OPDBMS Vendor Briefing
• Automated analysis & Anomaly detection
• Automated & targeted diagnostic collections (50+ top areas & growing)
• Automated Health Checks• Log masking, reduction &
diagnostic collections
• Automated repair
2017
• Automated log lifecycle management
• Preemptive fault prediction & correction
• Automated environment correlation for fault prioritization & flood control • Automated
workload forecasting
2015• Integration of database
support tools
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Machine Learning Use Cases
Machine learning basics
Log reduction & Anomaly timelineMaintenance slot identification
Detect Performance ProblemsProblem Signatures from Event Paths
1
2
3
23
4
5
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
3 Key Areas of Machine Learning
AnalyticsKnowledge discovery
Machine LearningLearn & get better from
experience
Artificial IntelligenceSimulate human
intelligence
24
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Examples of Machine Learning Problem Types
Example: Classify if a particular log entry is normal or not
ClassifiersPredict a label classification
Example: Predict when a system will run out of memory
RegressionPredict a value
Example: Group incidents into collections of similar ones, that share some common attributes
ClusteringForm groups by discovering
reoccurring patterns
25
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Machine Learning Categories
Supervised LearningPredict future outcomes with the help of training data provided by human experts
Semi-Supervised LearningDiscover patterns within raw data and make
predictions, which are then reviewed by human experts, who provide feedback which is used to
improve the model accuracy
Unsupervised LearningFind patterns without any external input other
than the raw data
Reinforcement LearningTake decisions based on past rewards for this
type of action
26
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Real-time Prevention• Data Ingestion– Kernel Smoothing and Moving Average– Interpolation and Imputation
• Prediction and Pattern Recognition– Multivariate and Auto-Associative Regression– Clustering, Similarity Operators and Bayes Networks
• Fault and Anomaly Detection– Sequential Probability Ratio Tests– Conditional Probability Filters & Hidden Markov
Models
• Prognosis and Diagnosis– Bayesian Belief Networks and Probabilistic Inference– Remaining Useful Life Regression and GPM Models
Rapid Recovery
Confidential – Oracle Restricted 27
Autonomous Health Platform ML Technologies
• Data Ingestion– ELK– Lucene
• Prediction and Pattern Recognition– TF-IDF and Bag-of-Words modelling– Sequence Matcher – K-nearest Neighbour
• Fault and Anomaly Detection– Decision Trees and Random Forest– Sequential Pattern Mining
• Prognosis and Diagnosis– Recurrent neural Network– Long short-term memory Predictive Analysis
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Log reduction & Anomaly timeline
Remove the noise from thousands of log events and metrics to identify key events revealing what happened, in what order and why
28
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Anomaly Detection – High Level
29
Known normal log entry (discard)Probable anomalous Line (collect)
Log Collection
File Type
1
File Type
2
File Type
n..
Log File
Anomaly Timeline
Probable Anomalies
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Autonomous Health Analysis - Ex: Trace File Analyzer
Auto Recommendation
Confidential – Oracle Restricted
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Autonomous Health – TFA Anomaly Timeline
Confidential – Oracle Restricted
32
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Maintenance slot identification
Find the next best window of time maintenance can be performed with minimal service impact
33
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Maintenance slot identification• Use case– Identify appropriate maintenance window for performing maintenance activity based
on historical workload patterns.
• Inputs (Training Data)– The Average Active Sessions (metric is important because it's best representation of
your database system load) in sliding window format. Preferred last 30days data
points before making the prediction.
• AAS = (DB Time / Elapsed Time)
• In other words, AAS is a time-normalized DB Time
• From DB Tables :
– V$ACTIVE_SESSION_HISTORY => COUNT(*) = DB Time in seconds {Cyclic buffer ~4 Hours}
– DBA_HIST_ACTIVE_SESS_HISTORY => 10 * (COUNT(*)) = DB Time in seconds {Since one in 10 samples}
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Maintenance slot identification• Seasonal Decomposition–Using an observed time series extract a number of component series where each of
these has a certain characteristic or type of behavior.– Time Series Decomposition• Trend– The trend component at time t, which reflects the long-term progression of the series – A trend exists when there is a persistent increasing or decreasing direction in the data
• Seasonality– The seasonal component at time t, reflecting seasonality – Seasonality occurs over a fixed and known period (e.g., the quarter of the year, the month, or day of the
week)• Residual– The irregular component (or "noise") at time t, which describes random, irregular influences– It represents the residuals or remainder of the time series after the other components have been removed.
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 36
Maintenance Slot Identification
START_TIME CNT2018-04-11 15:00:00 2902018-04-11 16:00:00 311202018-04-11 17:00:00 215302018-04-11 18:00:00 262402018-04-11 19:00:00 405202018-04-11 20:00:00 542702018-04-11 21:00:00 514602018-04-11 22:00:00 443102018-04-11 23:00:00 25690
START_TIME2018-04-11 15:00:00 -0.2260982018-04-11 16:00:00 -0.0698212018-04-11 17:00:00 -0.3500882018-04-11 18:00:00 -0.1874832018-04-11 19:00:00 -0.5132402018-04-11 20:00:00 0.0197372018-04-11 21:00:00 0.0592132018-04-11 22:00:00 -0.0113122018-04-11 23:00:00 -0.179156
START_TIME2018-04-11 15:00:00 5.6698812018-04-11 16:00:00 10.3456062018-04-11 17:00:00 9.9772032018-04-11 18:00:00 10.1750402018-04-11 19:00:00 10.6095512018-04-11 20:00:00 10.9017272018-04-11 21:00:00 10.8485602018-04-11 22:00:00 10.6989662018-04-11 23:00:00 10.153857
Current Date : 2018-05-12 15:00:00Current Position in Seasonality : -0.22609829742533585Best Maintenance Period in next Cycle : 2018-05-12 19:00:00Worst Maintenance Period in next Cycle : 2018-05-13 08:00:00
Original observation data1 Apply convolution filter & average2 Calculate seasonality3
Use seasonality to predict best maintenance window
4
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Anomaly Detection with OS and ASH Data
Detect performance problems
37
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Cluster Health Advisor – Applied Machine Learning
• Fault data driven model development• Applied purpose-built Applied ML for
knowledge extraction• Expert Dev team scrubs data• Generates Bayesian Network-based
diagnostic root-cause models• Uses BN-based run-time models to
perform real-time prognostics
38
Discovers Potential Cluster & DB Problems
CHA Dev TeamLogASHMetrics
MLKnowledgeExtraction
BNModels
Expert Supervision
CHARuntime
Model
Feedback
CHA
CHA
Scrub Data
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | Confidential – Oracle Internal/Restricted/Highly Restricted 39
Data Flow OverviewCluster Health Advisor
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. | 40
Models Capture the Dynamic Behavior of all Normal Operation Models Capture all Normal Operating Modes
0
500 0
100 00
150 00
200 00
250 00
300 00
350 00
400 00
10: 00 2: 00 6: 00
51009025
4024
2350
4100
2205010000
21000
4400
2500
4900
800
IOP S
use r co mmits (/sec)
log file para llel wri te (u sec)
log file sync (use c)
• Release ships with conservative models to minimize false warnings• A model captures the normal load phases and their statistics over time, and thus the characteristics for all load
intensities and profiles. During monitoring, any data point similar to one of the vectors is NORMAL. • One could say that the model REMEMBERS the normal operational dynamics over time
In-Memory Reference Matrix(Part of “Normality” Model)
IOPS #### 2500 4900 800 ####
User Commits #### 10000 21000 4400 ####
Log File Parallel Write #### 2350 4100 22050 ####
Log File Sync #### 5100 9025 4024 ####
… … … … … …
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Problem Signatures from Event Paths
Identify a series of events as connected and representing the signature of a problem
41
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Longest Common Subsequence of Anomalous Entries
42
1. Start by classifying a problem such as an important ORA or CRS error
2. Find occurrences of the problem across many different log files
3. Identify anomalous entries and lifecycle events in chronological order
4. Compare the repeating anomalous entries to identify the true anomalous entries
– These represent the problem signature– Sequence of events are correlated by component, log file, host &
thread
Find the Finite State Automata(FSA)
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
Generalizing event signatures over the scope of bug
•Bug
Sign
atur
e Re
posit
ory
43
Event Signature 35
Event Signature 3435
Event Signature 494
Event Signature 3948
Event Signature 292
Event Signature 434933
Node Eviction bug 243645 Timeline
Event Signature 3434
Event Signature 3435
Event Signature 4344
Event Signature 3048
Event Signature 202
Event Signature 434983
Node Eviction bug 2747747 Timeline
Event Signature 35
Event Signature 3435
Event Signature 3048
Event Signature 3948
Event Signature 292
Event Signature 434933
New Signature
Check for weighted probabilistic match
Copyright © 2017, Oracle and/or its affiliates. All rights reserved. |
• Thank you for your feedback!!• Please continue to reach out to us via
social media
Twitter @sandeshrLinkedinhttps://www.linkedin.com/in/raosandesh/
Questions ?