Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h

What’s New in DMX/DMX-h?

March 2017

Agenda

What’s New?

• Big Data + Quality

• DMX/DMX-h

• Big Data Integration– Access

– Integrate

– Comply

– Simplify

– Extend

What’s Coming Soon?

Integrated Workflow Demo

2Syncsort Confidential and Proprietary - do not copy or distribute

BIG DATA + QUALITY!

What’s New


Bringing Together Best-of-Breed Data Integration & Data Quality


“Existing customers and prospects can view this acquisition as positive. It extends Syncsort's information management capabilities

through strengthened data quality and data governance functionality for the use cases they encounter.”

- “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016

Foundational Components of Any Enterprise Data Management Strategy

– Best-in-class data integration functionality & performance

– Early adopter & leader in Hadoop, Spark, Cloud, Real-time

– Extensive partner ecosystem, and out-of-the-box integration with Hadoop tools stack

– Most robust mainframe access & integration capabilities in market

– Best-in-class, broad data quality capabilities & functions

– Expertise in Cloud, Big Data & Real-time

– Most robust profiling, parsing, standardization and matching capabilities in the market

– Support breadth of verticals and business data quality objectives

3

DMX / DMX-H

What’s New


Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration

• GUI for developing MapReduce & Spark jobs• Test & debug locally in Windows; deploy on Hadoop• Use-case Accelerators to fast-track development• Broad based connectivity with automated parallelism • Simply the best mainframe access and integration with Hadoop• Improved per node scalability and throughput

High Performance Hadoop ETL Software

• Template driven design for:o High performance ETLo SQL migration/DB offloado Mainframe data movement

• Light weight footprint on commodity hardware• High speed flat file processing• Self tuning engine

High Performance ETL Software


DMX

DMX-h

SIMPLIFY BIG DATA INTEGRATION

What’s New


Simplify Big Data Integration with Syncsort


Access

Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.

Access: Get Your Database data into Hadoop, At the Press of a Button

• Funnel hundreds of tables at once into your data lake‒ Extract, map and move whole DB schemas in one invocation‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza‒ To SQL Server, Postgres, Hive, HDFS and S3‒ Automatically create target Hive and HCat tables

• Process multiple funnels in parallel on edge node or data nodes‒ Order data flows by dependencies

‒ Leverage DMX-h high performance data processing engine

• Extract only the data you want‒ Data type filtering‒ Table, record or column exclusion / inclusion

• In-flight transformations and cleansing


DMX DataFunnel™

Move thousands of tables in days, not weeks!

Access: Bring ALL Enterprise Data Securely to the Data Lake


Database

– RDBMS

– MPP

– NoSQL

Mainframe

– DB2/z

– VSAM

– FTP Binary

– Mainframe Fixed

– Mainframe Variable

– Mainframe Distributable

– COBOL IT line sequential

– All file formats…

Big Data

– JSON

– Avro

– Parquet

– ORC

– Hive (Enhancements)

Streaming

– Kafka

– MapR Streams

– HDF (NiFi)

Cloud

– Amazon S3

– Amazon Redshift, RDS

– Google Cloud Storage

… And more!

Access: Hive Enhancements

Improvements to Hive support

JDBC connectivity

Support for partitioned tables: ORC, Parquet, AVRO, HDFS

Support for Truncate and Insert

Automatic creation of Hive and other Hcat supported tables

Direct distributed processing of Hive

Update of Hive statistics




Access Integrate


Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.

Integrate: Single Interface for Streaming & Batch


Kafka, MapR Streams, Apache Nifi, and Spark!

Combine legacy batch and cutting edge streaming data sources

Easy development in GUI – no need to write Scala, C or Java code

Spark 2.0!

Simplify Streaming Data Integration

Globalization Enhancements


Improved Fujitsu NetCOBOL support

Localization

Support for multi-byte copybooks

Complete support of ALL ICU code pages

– Drop down list in GUI that provides most common code pages at the top

– Remembers most recent code page selection and pre-populates



Access Integrate Comply



Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, ClouderaManager, ClouderaNavigator and Sentry.

Comply: Manage

Syncsort Confidential and Proprietary - do not copy or distribute17

Cloudera Manager

–Deploy DMX-h across Cloudera cluster

–Monitor DMX-h jobs

Apache Ambari

–Deploy DMX-h across Hortonworks and other clusters

–Monitor DMX-h jobs

Cloudera Director

–Deploy DMX-h on Cloudera in the Cloud

–Elastically expand and reduce capacity as needed for spikes in workload

Comply: Govern


Metadata and data lineage for Hive, Avro and Parquet through HCatalog

Metadata lineage export from DMX/DMX-h

–Simplify audits, analytics dashboards, metrics

– Integrate with enterprise metadata repositories

Cloudera Navigator certified integration

–Extends HCatalog metadata

–HDFS, YARN, Spark and other metadata

–Lineage, tagging

–Business and structural metadata

Apache Atlas lineage integration

–Lineage, tagging

–Audit and track

(Technical preview available now)



Access Integrate Comply Simplify



Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, ClouderaManager, ClouderaNavigator and Sentry.

Design once, deploy anywhere & insulate your organization from rapidly changing eco-system. Future proof your applications for new compute frameworks, on premise or in the cloud.

Simplify: Same Solution – On Premise or In the Cloud

• ETL engine on AWS Marketplace – Update to version 9.x

• Available on EC2, EMR, Google Cloud

• S3 and Redshift connectivity

• First & only leading ETL engine on Docker Hub

• Google Cloud Storage connectivity


Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective

Intelligent Execution - Insulate your people from underlying complexities of Hadoop.

Simplify: Design Once, Deploy Anywhere

21

Use existing ETL skills.

No worries abut mappers, reducers, big side, small side, and so on.

Automatic optimization for best performance, load balancing, etc.

No changes or tuning required, even if you change execution frameworks

Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0.

Inte

llige

nt

Exec

uti

on

Lay

er

One interface to design jobs to run on:

Single Node, Cluster

MapReduce 1, 2.x, Spark, Spark 2.0

Windows, Unix, Linux

On-Premise, Cloud

Batch, Streaming

Intelligent Execution - Insulate your people from underlying complexities of Hadoop.

Simplify: Design Once, Deploy Anywhere

22

Inte

llige

nt

Exec

uti

on

Lay

er

One interface to design jobs to run on:

Single Node, Cluster

MapReduce 1, 2.x, Spark, Spark 2.0

Windows, Unix, Linux

On-Premise, Cloud

Batch, Streaming

Integrated Workflow

In a single job, combine any execution location, framework or style.

Ingest data on an edge node, then process on the cluster in a single workflow

Combine MapReduce ETL with Spark data analysis

Run extended tasks and custom functions in framework of your choice

Integrated Workflow


ADD CUSTOM FUNCTIONALITY

Extend



Integrate: Easily Extend DMX / DMX-h with Custom Functions & Extended Tasks

• Enable data scientists to add new functions

• Ability to add custom transformation functions

– Shown in the GUI same as built-in functions

– Available via function pull-down and signature

• Ability to add job extensions to the data flow

• Publish a library in Syncsort github– Rounding Package

– Advanced Math Package

– Multiple Pivot options


Integrate: Extend User Base with Data Transformation Language (DTL)

• Metadata driven dynamic creation of DMX-h jobs

• Enables partners and end users to build on and extend DMX

• Human readable script-like interface for developing jobs

• Legacy ETL migrations to DMX

– Ability to import DTL to the DMX Graphical User Interface

– Maintain applications in the GUI

– Export metadata to DTL

WHAT’S NEXT?

Roadmap


Access: Keep Legacy and Modern Systems in Sync

Syncsort Confidential and Proprietary - do not copy or distribute

• Capture changes in source database as they happen

• Update target systems automatically

• Capture changes in huge tables without straining network capacity

• Minimize impact to source database performance

28

Delta Change Data Capture

Access: Hive Enhancements

Improvements to Hive support

JDBC connectivity

Support for partitioned tables: ORC, Parquet, AVRO, HDFS

Support for Truncate and Insert

Automatic creation of Hive and other Hcat supported tables

Distributed processing of Hive

Update of Hive statistics

Support for Hive tables with very complex arrays


Access: New User Experience for DataFunnel


DMX DataFunnel™

Access: New User Experience for DataFunnel


DMX DataFunnel™


THANK YOU!

Technology

Big Data Customer Education Webcast: The Latest Advancements in Syncsort DMX and DMX-h