Top Five Reasons for Data Warehouse Modernizationdownload. آ  Top Five Reasons for Data Warehouse

  • View
    0

  • Download
    0

Embed Size (px)

Text of Top Five Reasons for Data Warehouse Modernizationdownload. آ  Top Five Reasons for Data Warehouse

  • Top Five Reasons for Data Warehouse Modernization

    Philip Russom TDWI Research Director for Data Management

    May 28, 2014

  • Sponsor

  • 3

    Speakers

    Philip Russom TDWI Research Director,

    Data Management

    Steve Sarsfield Product Marketing Manager,

    HP Vertica

  • • Background – Why many users’ DWs

    need modernization

    – What is it?

    – There are many reasons, but I’ll boil it down to five

    • Top Five Reasons – Analytics

    – Scale

    – Speed

    – Productivity

    – Cost Control

    • New DW Architectures – Resulting from

    Modernization

    • Recommendations

    Agenda

    PLEASE TWEET @pRussom, #TDWI, #EDW, #DataWarehouse,

    #DataArchitecture, #Analytics, #RealTime

  • “DW Modernization” has many meanings… • Additions to existing data warehouse

    – New data subjects, sources, tables, dimensions, etc.

    • More standalone data platforms and tools – Complement DW without replacing it

    – More marts and ODSs

    – New appliances, columnar databases, Hadoop, NoSQL, etc.

    • Architectural Adjustments – All the above

    – Better design

    • Upgrades – Newer versions of

    current DBMS software

    – More hardware

    • Rip and Replace – Decommission current

    DW platform and migrate to another

  • 6

    Contact Information

    If you have further questions or comments:

    Philip Russom, TDWI

    prussom@tdwi.org Randy Lea, Teradata randy.lea@teradata.com

  • Top Five Goals for DW Modernization

    • I’ll mostly focus on improvements to:

    – Analytics, Scale, Speed

    • These regularly rank high in TDWI surveys, for example:

    • I’ll also mention improvements to:

    – Productivity, Cost Control

    • These regularly come up in TDWI interviews with users

    1. ANALYTICS

    2. SCALE

    3. SPEED

    SOURCE: 2014 TDWI Report: Evolving Data Warehouse Architectures, Figure 4

  • DW Modernization

    Goals are Related

    • Analytics needs

    better productivity

    • The challenge is to

    gain improvements

    with the first four

    goals without

    incurring more of

    the fifth: cost.

    • Speed contributes

    to scale and

    productivity

    CONCURRENCY • Competing Workloads

    • Reporting, Real Time,

    OLAP, Adv. Analytics, etc.

    • Intra-Day Data Loads

    • Thousands of Users

    • Ad hoc Queries

    SCALE • Big Data Volumes

    • Detailed Source Data

    • Thousands of Reports

    • Scale Out Into: • Clouds, clusters, grids,

    distributed architectures

    SPEED • Streaming Big Data

    • Event Processing

    • Real-Time Operation • Operational BI

    • Near-Time Analytics

    • Dashboard Refresh

    • Fast Queries

    COMPLEXITY

    • Big Data Variety • Unstructured Data

    • Machine/sensor Data

    • Web & Social Media

    • Many Sources/Targets

    • Complex Models & SQL

    • High Availability

    HIGH

    PERFORMANCE

    DATA

    WAREHOUSING

    (HiPer DW)

    SOURCE: 2012 TDWI

    Report: High

    Performance Data

    Warehousing, Figure 1.

  • BEYOND OLAP & REPORTING TO

    Advanced Analytics • Organizations need more analytic insights

    – To compete, serve customers, be profitable, control costs, improve quality, grow, etc.

    • Analytics is becoming a larger portion of BI work – Reporting and OLAP are still important

    • Organizations need advanced forms of analytics – Technologies: Extreme SQL, data mining, statistics, natural language

    processing, text mining, AI, graph, etc.

    – Methods: Predictive, clustering, segmentation, risk, fraud detection, etc.

    • Most users designed EDWs for reporting and OLAP – Analytics’ requirements differ from reports and OLAP

    • Users face multiple paths to enabling advanced analytics – Retrofit analytics onto report-focused EDW

    – Deploy an analytic data platform that complements the EDW

    – Replace the EDW’s platform with one that handles all workloads

  • Scale TO MORE DATA, USERS, REPORTS, ANALYSES…

    • Data’s Growing Volumes are a Challenge – Large Data Warehouses – data for both reporting and analytics

    – Big Data – volume aside, also diversity of data type, source, latency

    • Scale is also a Challenge to Basic BI Functions, like Reporting – Thousands of Concurrent BI Users; Thousands of Reports

    – Eventually, thousands of analytic users

    • Scale to Increasing Complexity – More processing for ETL, integration, quality, analytics, real time, etc.

    – Distributed DW architectures have more moving parts

    • Scale despite Growing numbers of Concurrent Workloads – Reporting, Real Time, OLAP, Analytics, Data Loads, Ad hoc Queries…

    • Users have a number of choices for scaling – Scale Up: More hardware for more data; efficient storage

    – Scale Out: Clouds, clusters, grids, racks, distributed architectures

    – Deploy or migrate to data platforms built for analytics with big data: columnar databases, data warehouse appliances, newer brands of databases, Hadoop, NoSQL, etc.

  • EVERTHING NEEDS MORE

    Speed • Speed involves a temporal continuum

    – From high performance to near time and true real time

    • Speed is enabled by a functional continuum – From hardware to perky queries to event processing

    – Many options are available for modernizing EDWs and analytics

    • High performance functionality – In-memory databases, in-database analytics, columnar

    databases, DW appliances, solid-state drives, modern CPUs, big memory in servers,

    • Near-time functionality – Microbatches, federation, virtualization, replication, services,

    query optimization, etc.

    • Real-time functionality – Complex event processing (CEP), stream processing, operational

    intelligence, etc.

  • MORE SOLUTIONS IN LESS TIME

    Productivity • Agile and lean development methods

    – Early prototype, built out iteratively

    • Instead of older “big bang” deliverables

    – Biz folks review/guide each iteration

    • To assure IT-to-biz alignment

    • Requirements gathering (RG) now done online

    – Data exploration, discovery, profiling replace RG

    – Req’s captured online, applied directly to solution

    • Fast tools and platforms make analytics productive

    – “Speed of thought” iterative analysis

    – Fast queries & bulk loads build analytic datasets fast

    • Less time per project means

    – More projects

    – Organization uses solution sooner

    – Greater agility for the business

  • DATA VARIES IN VALUE; MANAGE IT ACCORDINGLY

    Economics • As you modernize a DW environment, rethink its economics

    • Cost continuum of data platforms:

    • Choose a platform that fits a given data workload – but also fits the value of data

    – High-value data on the core EDW

    • Modeling, cleansing, aggregating, and documenting data (which is required for reports and OLAP) increases its value

    – Analytic datasets in the mid tier

    • This data is lightly prepared or prepped on the fly; temp sandboxes

    – Source & archival data on the back tier

    • This is more of a “data lake” that preserves data in its original form, so it can be repurposed repeatedly, as analytic projects arise

    High $/Tb

    Traditional Platforms

    New Affordable Platforms,

    built for DW/Analytics

    Cheap Open Source:

    Hadoop, NoSQL

  • ONE WAY TO MODERNIZE A DW

    Multi-Platform Data Warehouse Environments

    • Many enterprise data warehouses (EDWs) are evolving into

    multi-platform data warehouse environments (DWEs).

    • Users continue to add additional standalone data platforms to

    their warehouse tool and platform portfolio.

    • The new platforms don’t replace the core warehouse, because

    it is still the best platform for the data that goes into standards

    reports, dashboards, performance management, and OLAP.

    • Instead, the new platforms complement the warehouse,

    because they are optimized for workloads that manage,

    process, and analyze new forms of big data, non-structured

    data, and real-time data.

  • Modern DW System Architectures can be Complex

    • The technology stack for DW, BI, analytics, and data integration has always been a multi-platform environment.

    • What’s new? The trend toward a portfolio of many data platforms has accelerated.

    • Why? More platform types to serve more data and workload types.

    Complex,

    Event

    Processing

    Streaming

    Data

    Tools

    Analytic

    Sand

    Box