Upload
w-daniel-cox-iii-cma-cfm
View
715
Download
3
Embed Size (px)
Citation preview
Creating Insights
at the
Speed of Business
W. Daniel Cox, III CPA, CMA, CFM
Chief Executive Officer
WELCOMEto
Meet Up Group
Energise Organisational
Advantage through
Awareness and Insight
Registration & Networking
Keynote – Dan Cox, CEO of Data Transformed
KNIME & Harvest Analytics – Tom Park
Office of State Revenue Case Study – Anand Antony
Using Spark with KNIME – Chhitesh Shrestha
Networking & Drinks
Journey to Best in Class AnalyticsWe Help our Clients along this Path
Time
Value
ProactiveDiscover and
Predict Performers
ReactiveMonitor and Alert FollowersStatic
Report and Drill-down
Laggards
DynamicAnalytics-enabled
business processes
Innovators
YOUR DATA. CLEARLY
Source
Your
Data
Realise
Data
Value
Prepare
Your
Data
Data Preparation
Plan
With
Data
Budget/Planning
Visualise
All
Data
Visualisation
BUDGET PLANNING Budgeting
Forecasting
Planning
Demand Planning
Workforce Management
Accounting
Financing
Cashflow
Sales Forecasting
Modelling
Campaign Forecasting
DATA PREPARATION
Data GovernanceData QualityMaster Data ManagementData WarehousingData ScienceETL ApplicationsData AnalyticsSQL LanguagePython LanguageScriptingDatabase ManagementApplication DevelopmentDatabase DevelopmentTextual ETLText AnalyticsHadoop EcosphereAnalytical DatabasesRelational DatabasesMicrosoft Analysis ServerOLAPOLTPMulti-Dimensional DatabasesData Vault ArchitecturesStar-Schema ArchitecturesData Marting
Data Transformed Skill Sets
VISUALISATION30%
BUDGET PLANNING
20%
DATA PREPARATION
50%
VISUALISATIONDashboarding
Reporting
Charting
Location Analytics
Statistical Analytics
Data Analytics
Business Analysis
Story Telling
Symmantic Layer
Presentation Layer
Collabration
Slow Fast
Immature
IndustrialStrength
Ente
rpri
se R
ead
ine
ss
Performance
Good Enough
ProductionReady
TraditionalOperational
Open Source
Vortex
Actian – Fast, Industrialized, Open
Superior Big Data SQL with Industrialized strength
Do YOU
Have a
BIG DATA Role
Global Data Snapshot
…
7,254,549,796Total World Population
3,035,749,340Internet Users
2,078,680,860Active Social Network Users
6,572,950,124Mobile Subscribers
• Challenges• Constrains data to app
• Can’t manage new data
• Costly to Scale
Business Value
Clickstream
Geolocation
Web Data
Internet of Things
Docs, emails
Server logs
2012
2.8 Zettabytes
2020
44 Zettabytes
LAGGARDS
INDUSTRY
LEADERS
1
2New Data
ERP CRM SCM
New
Traditional
Traditional systems under pressure
12 Zettabytes
Volume Exponential Growth
Variety New Data Types
Velocity Time To Value
The Digital Floodgates have opened…
and will never be turned off again
Big Data equals Big Opportunity
Data Source & Type Untouched
Value New Possibilities
88OF BIG DATA
15TRILLION
$
Universal Access Time To Value
OF COMPANIES
%
%
1
Trends for BIG DATA
In the Cloud
Trends for BIG DATA
Personal ETL
Trends for BIG DATA
NoSQL
Trends for BIG DATA
Hadoop
Trends for BIG DATA
Data Lake
Trends for BIG DATA
Ecosystem
Trends for BIG DATA
Internet of
Things
Big Data Trends
1. Big Data in the Cloud
2. Personal ETL
3. NoSQL
4. Hadoop
5. Data Lakes
6. Big Data Ecosystem
7. Internet of Things
BIG DATA
is STILL just
Data
It needs to be translated into Answers
Acquire, Grow & Retain Customers
Who are your best customers
and how can you keep them
satisfied?
Where can you find more
customers like them?
Big data holds the insights into
who your customers are and
what motivates them.
Optimise Operations & Reduce Fraud
Are your operational processes
and systems as efficient as
they could be?
Could you reduce waste and
fraud if you had real-time
visibility into your business?
Adopting a big data and analytics
strategy can help you plan,
manage and maximise
operations, supply chains and the
use of infrastructure assets.
Transform Financial Processes
Do you have real-time access
to reliable information about all
aspects of your business?
Do you have the visibility,
insight and control over
financial performance to better
measure, monitor and shape
business outcomes?
Analysing all of your data,
including big data, can drive
enterprise agility and provide
insights to help you make better
decisions
Manage Risk
How can you mitigate the
financial and operational risks
that could devastate your
organisation?
How can you manage
regulatory change and reduce
the risk of non-compliance?
Proactively identifying,
understanding and managing
financial and operational risk can
enable more risk-aware,
confident decision making
Create New Business Models
Are your competitors making
bigger strides in changing your
industry or creating new markets
than you?
Does your organisation’s culture
support innovative thinking and
exploration?
Explore strategic options for
business growth, using new
perspectives gained from exploiting
big data and analytics
Improve IT Economics
Is your existing IT infrastructure
able to provide the insights that
decision makers need?
Are you doing enough to protect
your data centre and data from
potential criminal activity or
fraud?
Lead the creation of new value
and agility for your business by
optimising big data and analytics
for faster insight at a lower cost
Analytics Trends
1. Data Governance
2. Social Intelligence
3. Analytics Organisation-Wide
4. Community Collaboration
5. Integration of Everything
6. Cloud Analytics
7. Conversational Data
8. Journalism Data
9. Mature Mobility
10.Smart Analytics
Areas BIG DATA is Helping
1. Operations & Optimising
2. Product Development
3. Customer Experience
4. Understanding and Targeting Customers
Performance Examples
Actian is Helping These Companies Achieve Leadership
Digital Marketing: Hyper-segmentation every hour
Banking: Enterprise Risk every 2 minutes
Retail: Enterprise Market Basket Analysis every minute
Defense: Network intrusion models every second
Fraud: Adjustments every nano-second
Amazon Redshift – Actian Matrix Cloud-based, Petabyte Scale Data Warehouse
The Value of Business Intelligence
Organisations competing with Analytics
Substantially OUTPERFORMtheir peers by
220%
Data Transformed
Actian Vector: Example
https://youtu.be/dYTF5ZNioEI
Identical 150 Million Transaction Query
Comparison between Actian Vector & Oracle DBMS
Harvest Analytics
Tom Park
Overview KNIME & Big Data
Tom Park
Gartner 2016 Magic Quadrant Advanced Analytics Platforms
Niche Players (5):FICOLavastormMegaputerPrognozAccenture
Leaders (5):SAS IBM KNIME RapidMinerDell
Visionaries (4):Microsoft Alteryx Alpine Data Labs Predixion
Challengers (2):SAP Angoss
Changes from 2015 to 2016
X Salford & TIBCODropped due to not satisfying the visual composition
Main Big Data Technologies
NO SQL
Big Data Architecture
KNIME Big Data Extensions
Future Trends
Missing Ingredient to Success?
www.dataroos.com
Office of State
Revenue
Anand Antony
KNIME @ OSRAnand Antony
Senior Data AnalystOperations Analytics and Intelligence
Office of State Revenue
[email protected]. 0414491765
OSR: Who are we?
As NSW’s principal revenue agency, OSR administers state taxation and revenue for, and on behalf of, the people of NSW
◦ Payroll tax
◦ Land tax
◦ Duties
◦ Grants such as First Home Benefits
Data Analytics Team: Who are we?
Operations Analytics & Intelligence is the analytics wing of the Operations Division in OSR◦ Three teams – Business Intelligence, Data Analytics and
Data Team
Data Analytics team consists of 10 analysts
Supports tax auditors by detecting possible non-compliant clients◦ Via matching data from various sources and analysing
them
◦ 60+ data sources
Data Analytics Scenario - Past
Data matching, preparation and analysis◦ SPSS Clementine, SAS Enterprise Guide
Data mining◦ Salford Systems
Reporting/Dashboards◦ Excel
Fuzzy data matching◦ SSA Name (Informatica)
Data Analytics Scenario - Current
Data matching, preparation and analysis◦ KNIME (around 70% transitioned from
Clementine/SAS)
Data mining◦ Salford Systems◦ Will be evaluating KNIME
Reporting/Dashboards◦ Excel
Fuzzy data matching◦ SSA Name (Informatica)
Inte
rnal &
Ext
ern
al D
ata
Sourc
es
Data Governance
Data Quality
Data Matching
MetadataManagement
MapR Hadoop Distribution
Data Lake
VortexMapR
Advanced Data AnalyticsActian/Knime
Machine LearningH2O/ Spark
Actian/Knime
Future: Unified Analytic & Data Management Platform
Governance
Visualisation
Presentation Layer
Datamart
On the fly / Sandpit
Spotfire/Tableau/
Graph DBs
Why KNIME?
Enrich with coding via coding snippets◦ Mostly Java snippet at the moment
Start with canvas programming
Fast and easy learning curve for data scientists
Can tackle almost any analytic task
KNIME - Having the best of both worlds!
◦ Canvas programming Coding
What do we use KNIME for?
Pretty much for everything! (except reporting and datamining)◦ Data reading (text files, databases, non-standard formats)
◦ Data merging (potentially fuzzy matching too in future)
◦ Data manipulation
◦ Creating new variables
◦ Data Output
◦ Modelling (possibly in future)
Key nodes/functionalities
◦ Sorter, Column Reorder, Column Filter, Column Rename
◦ Concatenate, Joiner, Reference Row Filter (anti-join)
◦ Missing value◦ Math Formula, String Manipulation, Rule Engine, Java Snippet
◦ GroupBy (aggregate, dedupe)◦ Value Counter, Pivoting◦ Looping◦ Regular expressions/wildcards in various nodes
Data Preparation Example
Case study 1
Officers fill in a questionnaire on the entity audited – one excel spreadsheet for one entity
Collate all the spreadsheets stored in a location
Massage the data to produce an analysis dataset with one row per entity
Key KNIME nodes/functionalities used◦ List files
◦ Table Row to Variable Loop Start, Loop End
◦ Java Snippet
Questionnaire data for one client
Overview of Knime flow
Bring data to tabular form
Within this Meta node, there is one Java Snippet for each question in the questionnaire
Details of a Java Snippet
Result of the Meta Node
To get a single record for a client- Just take the last row for a “client
block”!- Explained in the next slide
For each “client block” aggregate the variables
End result
1000 spread-sheets 1000 rows
Case study 2 – Use of Flow variables
Technique ◦ Input metadata rules into a file
◦ Read and convert into flow variables
Example ◦ Reorder variables in a dataset as per the
order in the data dictionary
◦ We use “Flow variables” tab in Column Reorder tab to achieve this
Use of flow variables
Use this tab
Do not use this “manual” tab
KNIME wishlist! Offset function in some nodes
eg. Rule Engine, Math formula Offset function gives the value of a variable in a previous row.
Eg. In SPSS Clementine @OFFSET(var,1) gives the value in the previous row.
Note:- Within Java Snippet this is readily achieved since a variable retains its value until it is over-written. Therefore we can conveniently first utilise the value populated from the previous row inside a formula.Then we can update the value from the current row so as to be used in the next row.
Questions?
Data Transformed
Chhitesh Shrestha
Apache Spark on KNIME
Unleash the power of Big Data on Hadoop
The Big Data Problem: Data Volume
1. Storage are getting cheaper
2. Data sources are increasing
3. Thus, data is growing faster
YARN
But, Still processing them is a problem. Why ?
The Big Data Problem: Processing
Now, as the memory is cheaper.
Why Apache Spark ?
Apache Spark is an open source parallel
processing framework that enables users to
run large scale data analytics across clustered
computers.
• Speed
• Flexible with programming platform
• Generality
• Run Everywhere
Spark Components
Spark Comparison on Calculation of Average
List of Spark Nodes
Getting the data in and out of Spark
Data into Spark Data out of Spark
Statistics and Data Manipulation Nodes
Statistics Data Manipulation
Mining Nodes
Learners Predictors
Other Nodes
KNIME Spark Executor Architecture
Current Supported Hadoop and KNIME Versions
Hadoop Versions
• Hortonworks HDP 2.2 with Spark 1.2.x
• Hortonworks HDP 2.3 with Spark 1.3.x
• Cloudera CDH 5.3 with Spark 1.2.x
• Cloudera CDH 5.4 with Spark 1.3.x
KNIME Versions
• KNIME Analytics Platform 3.1
• KNIME Server 4.2
Lots of talking… Lets view a demo
Actian Vortex on Hadoop 10 minute Demo
http://videos.actian.com/watch/6iEZqvJrEKL2btoqIDImcg
Demonstration of Vortex, Dataflow & Vector
Comparison between Actian Vortex & Cloudera Impala
Actian Vector: Example
https://youtu.be/dYTF5ZNioEI
Identical 150 Million Transaction Query
Comparison between Actian Vector & Oracle DBMS