Edh offloading

Preview:

DESCRIPTION

EDH offloading by Sunil Sitaula

Citation preview

CONFIDENTIAL - RESTRICTED‹#› CONFIDENTIAL - RESTRICTED

EDH Off-loadOctober 15, 2014

CONFIDENTIAL - RESTRICTED‹#›

Agenda

• What does it mean • Why • Approaches • Things to consider • Questions

CONFIDENTIAL - RESTRICTED‹#›

What does it mean..

data applications users

.. from existing system (enterprise data warehouses) to Cloudera Enterprise Data Hub (EDH)

CONFIDENTIAL - RESTRICTED‹#›

Why....a number of reasons... .. Cost .. Flexibility – structured/un-structured

CONFIDENTIAL - RESTRICTED‹#›

Approaches..

.. Specific .. Use Case .. Application .. Partial .. Full

CONFIDENTIAL - RESTRICTED‹#›

Specific..

.. This is the way to start.. .. Pick a use case or small to medium non-critical application .. End-to-end

CONFIDENTIAL - RESTRICTED‹#›

Why Specific..

.. Reveal ah-ha moments

.. Gain experience

.. Iron out support, operations, admin, issues .. In some cases, complete switch may not be feasible, still do end-to-end but feed needed data back to old system

CONFIDENTIAL - RESTRICTED‹#›

Partial..

.. Now that there is in-house experience and expertise built, focus on extending the migration effort to other areas .. Follow the same pattern, end-to-end

CONFIDENTIAL - RESTRICTED‹#›

Full..

.. In some cases a full off-load may be feasible .. But don’t fool yourself .. Existing systems might have been there for years .. May have 100s of TB, hundreds of databases, thousands of tables, views, stored procs, scripts, macros, workflows, reports and dozens of apps pointed to it.. .. This may entail finishing lots of partial offloads staged, verified, and ready to go before a full migration

CONFIDENTIAL - RESTRICTED‹#›

Planning..

.. How to keep existing systems in sync .. Feedback/keep-alive loop ..Processed data may need to be pumped back and forth .. Keeping ID’s in sync (deciding system of record) .. Impact on existing environment

.. While migrating existing data

.. While keeping old and new system in sync

.. Number of connections

CONFIDENTIAL - RESTRICTED‹#›

Sqoop..

.. Will help significantly in both migrating data as well schemas .. Automate as much as possible

.. Give script a DB.. list of tables or ones to avoid and have it take care of the rest

.. But will still involve manual touch points .. Data types .. Not all data types maybe supported .. Mappings .. Connectors – go through options properly

CONFIDENTIAL - RESTRICTED‹#›

Key take ways..

.. Start with specific use case

.. Identify dependencies and keep alive processes

.. Avoid scope creep.. Oh no we need that dataset too. .. Engage developers, testers business owners early .. Could be complex but done properly could result in significant savings, flexibility and new capabilities..

CONFIDENTIAL - RESTRICTED‹#› CONFIDENTIAL - RESTRICTED

Questions