32
What’s New in DMX/DMX-h? March 2017

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Embed Size (px)

Citation preview

Page 1: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

What’s New in DMX/DMX-h?

March 2017

Page 2: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Agenda

What’s New?

• Big Data + Quality

• DMX/DMX-h

• Big Data Integration– Access

– Integrate

– Comply

– Simplify

– Extend

What’s Coming Soon?

Integrated Workflow Demo

2Syncsort Confidential and Proprietary - do not copy or distribute

Page 3: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

BIG DATA + QUALITY!

What’s New

3Syncsort Confidential and Proprietary - do not copy or distribute

Page 4: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Bringing Together Best-of-Breed Data Integration & Data Quality

4Syncsort Confidential and Proprietary - do not copy or distribute

“Existing customers and prospects can view this acquisition as positive. It extends Syncsort's information management capabilities

through strengthened data quality and data governance functionality for the use cases they encounter.”

- “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016

Page 5: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Foundational Components of Any Enterprise Data Management Strategy

– Best-in-class data integration functionality & performance

– Early adopter & leader in Hadoop, Spark, Cloud, Real-time

– Extensive partner ecosystem, and out-of-the-box integration with Hadoop tools stack

– Most robust mainframe access & integration capabilities in market

– Best-in-class, broad data quality capabilities & functions

– Expertise in Cloud, Big Data & Real-time

– Most robust profiling, parsing, standardization and matching capabilities in the market

– Support breadth of verticals and business data quality objectives

3

Page 6: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

DMX / DMX-H

What’s New

6Syncsort Confidential and Proprietary - do not copy or distribute

Page 7: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration

• GUI for developing MapReduce & Spark jobs• Test & debug locally in Windows; deploy on Hadoop• Use-case Accelerators to fast-track development• Broad based connectivity with automated parallelism • Simply the best mainframe access and integration with Hadoop• Improved per node scalability and throughput

High Performance Hadoop ETL Software

• Template driven design for:o High performance ETLo SQL migration/DB offloado Mainframe data movement

• Light weight footprint on commodity hardware• High speed flat file processing• Self tuning engine

High Performance ETL Software

7Syncsort Confidential and Proprietary - do not copy or distribute

DMX

DMX-h

Page 8: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

SIMPLIFY BIG DATA INTEGRATION

What’s New

8Syncsort Confidential and Proprietary - do not copy or distribute

Page 9: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Simplify Big Data Integration with Syncsort

9Syncsort Confidential and Proprietary - do not copy or distribute

Access

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Page 10: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: Get Your Database data into Hadoop, At the Press of a Button

• Funnel hundreds of tables at once into your data lake‒ Extract, map and move whole DB schemas in one invocation‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza‒ To SQL Server, Postgres, Hive, HDFS and S3‒ Automatically create target Hive and HCat tables

• Process multiple funnels in parallel on edge node or data nodes‒ Order data flows by dependencies

‒ Leverage DMX-h high performance data processing engine

• Extract only the data you want‒ Data type filtering‒ Table, record or column exclusion / inclusion

• In-flight transformations and cleansing

10Syncsort Confidential and Proprietary - do not copy or distribute

DMX DataFunnel™

Move thousands of tables in days, not weeks!

Page 11: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: Bring ALL Enterprise Data Securely to the Data Lake

11Syncsort Confidential and Proprietary - do not copy or distribute

Database

– RDBMS

– MPP

– NoSQL

Mainframe

– DB2/z

– VSAM

– FTP Binary

– Mainframe Fixed

– Mainframe Variable

– Mainframe Distributable

– COBOL IT line sequential

– All file formats…

Big Data

– JSON

– Avro

– Parquet

– ORC

– Hive (Enhancements)

Streaming

– Kafka

– MapR Streams

– HDF (NiFi)

Cloud

– Amazon S3

– Amazon Redshift, RDS

– Google Cloud Storage

… And more!

Page 12: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: Hive Enhancements

Improvements to Hive support

JDBC connectivity

Support for partitioned tables: ORC, Parquet, AVRO, HDFS

Support for Truncate and Insert

Automatic creation of Hive and other Hcat supported tables

Direct distributed processing of Hive

Update of Hive statistics

12Syncsort Confidential and Proprietary - do not copy or distribute

Page 13: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Simplify Big Data Integration with Syncsort

13Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Page 14: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Integrate: Single Interface for Streaming & Batch

14Syncsort Confidential and Proprietary - do not copy or distribute

Kafka, MapR Streams, Apache Nifi, and Spark!

Combine legacy batch and cutting edge streaming data sources

Easy development in GUI – no need to write Scala, C or Java code

Spark 2.0!

Simplify Streaming Data Integration

Page 15: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Globalization Enhancements

15Syncsort Confidential and Proprietary - do not copy or distribute

Improved Fujitsu NetCOBOL support

Localization

Support for multi-byte copybooks

Complete support of ALL ICU code pages

– Drop down list in GUI that provides most common code pages at the top

– Remembers most recent code page selection and pre-populates

Page 16: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Simplify Big Data Integration with Syncsort

16Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate Comply

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, ClouderaManager, ClouderaNavigator and Sentry.

Page 17: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Comply: Manage

Syncsort Confidential and Proprietary - do not copy or distribute17

Cloudera Manager

–Deploy DMX-h across Cloudera cluster

–Monitor DMX-h jobs

Apache Ambari

–Deploy DMX-h across Hortonworks and other clusters

–Monitor DMX-h jobs

Cloudera Director

–Deploy DMX-h on Cloudera in the Cloud

–Elastically expand and reduce capacity as needed for spikes in workload

Page 18: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Comply: Govern

Syncsort Confidential and Proprietary - do not copy or distribute18

Metadata and data lineage for Hive, Avro and Parquet through HCatalog

Metadata lineage export from DMX/DMX-h

–Simplify audits, analytics dashboards, metrics

– Integrate with enterprise metadata repositories

Cloudera Navigator certified integration

–Extends HCatalog metadata

–HDFS, YARN, Spark and other metadata

–Lineage, tagging

–Business and structural metadata

Apache Atlas lineage integration

–Lineage, tagging

–Audit and track

(Technical preview available now)

Page 19: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Simplify Big Data Integration with Syncsort

19Syncsort Confidential and Proprietary - do not copy or distribute

Access Integrate Comply Simplify

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, ClouderaManager, ClouderaNavigator and Sentry.

Design once, deploy anywhere & insulate your organization from rapidly changing eco-system. Future proof your applications for new compute frameworks, on premise or in the cloud.

Page 20: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Simplify: Same Solution – On Premise or In the Cloud

• ETL engine on AWS Marketplace – Update to version 9.x

• Available on EC2, EMR, Google Cloud

• S3 and Redshift connectivity

• First & only leading ETL engine on Docker Hub

• Google Cloud Storage connectivity

20Syncsort Confidential and Proprietary - do not copy or distribute

Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective

Page 21: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Intelligent Execution - Insulate your people from underlying complexities of Hadoop.

Simplify: Design Once, Deploy Anywhere

21

Use existing ETL skills.

No worries abut mappers, reducers, big side, small side, and so on.

Automatic optimization for best performance, load balancing, etc.

No changes or tuning required, even if you change execution frameworks

Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0.

Inte

llige

nt

Exec

uti

on

Lay

er

One interface to design jobs to run on:

Single Node, Cluster

MapReduce 1, 2.x, Spark, Spark 2.0

Windows, Unix, Linux

On-Premise, Cloud

Batch, Streaming

Page 22: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Intelligent Execution - Insulate your people from underlying complexities of Hadoop.

Simplify: Design Once, Deploy Anywhere

22

Inte

llige

nt

Exec

uti

on

Lay

er

One interface to design jobs to run on:

Single Node, Cluster

MapReduce 1, 2.x, Spark, Spark 2.0

Windows, Unix, Linux

On-Premise, Cloud

Batch, Streaming

Integrated Workflow

In a single job, combine any execution location, framework or style.

Ingest data on an edge node, then process on the cluster in a single workflow

Combine MapReduce ETL with Spark data analysis

Run extended tasks and custom functions in framework of your choice

Page 23: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Integrated Workflow

23Syncsort Confidential and Proprietary - do not copy or distribute

Page 24: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

ADD CUSTOM FUNCTIONALITY

Extend

24Syncsort Confidential and Proprietary - do not copy or distribute

Page 25: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

25Syncsort Confidential and Proprietary - do not copy or distribute

Integrate: Easily Extend DMX / DMX-h with Custom Functions & Extended Tasks

• Enable data scientists to add new functions

• Ability to add custom transformation functions

– Shown in the GUI same as built-in functions

– Available via function pull-down and signature

• Ability to add job extensions to the data flow

• Publish a library in Syncsort github– Rounding Package

– Advanced Math Package

– Multiple Pivot options

Page 26: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

26Syncsort Confidential and Proprietary - do not copy or distribute

Integrate: Extend User Base with Data Transformation Language (DTL)

• Metadata driven dynamic creation of DMX-h jobs

• Enables partners and end users to build on and extend DMX

• Human readable script-like interface for developing jobs

• Legacy ETL migrations to DMX

– Ability to import DTL to the DMX Graphical User Interface

– Maintain applications in the GUI

– Export metadata to DTL

Page 27: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

WHAT’S NEXT?

Roadmap

27Syncsort Confidential and Proprietary - do not copy or distribute

Page 28: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: Keep Legacy and Modern Systems in Sync

Syncsort Confidential and Proprietary - do not copy or distribute

• Capture changes in source database as they happen

• Update target systems automatically

• Capture changes in huge tables without straining network capacity

• Minimize impact to source database performance

28

Delta Change Data Capture

Page 29: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: Hive Enhancements

Improvements to Hive support

JDBC connectivity

Support for partitioned tables: ORC, Parquet, AVRO, HDFS

Support for Truncate and Insert

Automatic creation of Hive and other Hcat supported tables

Distributed processing of Hive

Update of Hive statistics

Support for Hive tables with very complex arrays

29Syncsort Confidential and Proprietary - do not copy or distribute

Page 30: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: New User Experience for DataFunnel

30Syncsort Confidential and Proprietary - do not copy or distribute

DMX DataFunnel™

Page 31: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Access: New User Experience for DataFunnel

31Syncsort Confidential and Proprietary - do not copy or distribute

DMX DataFunnel™

Page 32: Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

Syncsort Confidential and Proprietary - do not copy or distribute32

THANK YOU!