Upload
syncsort
View
79
Download
5
Embed Size (px)
Citation preview
What’s New in DMX/DMX-h?
March 2017
Agenda
What’s New?
• Big Data + Quality
• DMX/DMX-h
• Big Data Integration– Access
– Integrate
– Comply
– Simplify
– Extend
What’s Coming Soon?
Integrated Workflow Demo
2Syncsort Confidential and Proprietary - do not copy or distribute
BIG DATA + QUALITY!
What’s New
3Syncsort Confidential and Proprietary - do not copy or distribute
Bringing Together Best-of-Breed Data Integration & Data Quality
4Syncsort Confidential and Proprietary - do not copy or distribute
“Existing customers and prospects can view this acquisition as positive. It extends Syncsort's information management capabilities
through strengthened data quality and data governance functionality for the use cases they encounter.”
- “Syncsort Accelerates Data Quality With Trillium Acquisition Deal,” Gartner, December 6, 2016
Foundational Components of Any Enterprise Data Management Strategy
– Best-in-class data integration functionality & performance
– Early adopter & leader in Hadoop, Spark, Cloud, Real-time
– Extensive partner ecosystem, and out-of-the-box integration with Hadoop tools stack
– Most robust mainframe access & integration capabilities in market
– Best-in-class, broad data quality capabilities & functions
– Expertise in Cloud, Big Data & Real-time
– Most robust profiling, parsing, standardization and matching capabilities in the market
– Support breadth of verticals and business data quality objectives
3
DMX / DMX-H
What’s New
6Syncsort Confidential and Proprietary - do not copy or distribute
Syncsort DMX & DMX-h: Simple and Powerful Big Data Integration
• GUI for developing MapReduce & Spark jobs• Test & debug locally in Windows; deploy on Hadoop• Use-case Accelerators to fast-track development• Broad based connectivity with automated parallelism • Simply the best mainframe access and integration with Hadoop• Improved per node scalability and throughput
High Performance Hadoop ETL Software
• Template driven design for:o High performance ETLo SQL migration/DB offloado Mainframe data movement
• Light weight footprint on commodity hardware• High speed flat file processing• Self tuning engine
High Performance ETL Software
7Syncsort Confidential and Proprietary - do not copy or distribute
DMX
DMX-h
SIMPLIFY BIG DATA INTEGRATION
What’s New
8Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Big Data Integration with Syncsort
9Syncsort Confidential and Proprietary - do not copy or distribute
Access
Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
Access: Get Your Database data into Hadoop, At the Press of a Button
• Funnel hundreds of tables at once into your data lake‒ Extract, map and move whole DB schemas in one invocation‒ Extract from Oracle, DB2/z, MS SQL Server, Teradata and Netezza‒ To SQL Server, Postgres, Hive, HDFS and S3‒ Automatically create target Hive and HCat tables
• Process multiple funnels in parallel on edge node or data nodes‒ Order data flows by dependencies
‒ Leverage DMX-h high performance data processing engine
• Extract only the data you want‒ Data type filtering‒ Table, record or column exclusion / inclusion
• In-flight transformations and cleansing
10Syncsort Confidential and Proprietary - do not copy or distribute
DMX DataFunnel™
Move thousands of tables in days, not weeks!
Access: Bring ALL Enterprise Data Securely to the Data Lake
11Syncsort Confidential and Proprietary - do not copy or distribute
Database
– RDBMS
– MPP
– NoSQL
Mainframe
– DB2/z
– VSAM
– FTP Binary
– Mainframe Fixed
– Mainframe Variable
– Mainframe Distributable
– COBOL IT line sequential
– All file formats…
Big Data
– JSON
– Avro
– Parquet
– ORC
– Hive (Enhancements)
Streaming
– Kafka
– MapR Streams
– HDF (NiFi)
Cloud
– Amazon S3
– Amazon Redshift, RDS
– Google Cloud Storage
… And more!
Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Direct distributed processing of Hive
Update of Hive statistics
12Syncsort Confidential and Proprietary - do not copy or distribute
Simplify Big Data Integration with Syncsort
13Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate
Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.
Integrate: Single Interface for Streaming & Batch
14Syncsort Confidential and Proprietary - do not copy or distribute
Kafka, MapR Streams, Apache Nifi, and Spark!
Combine legacy batch and cutting edge streaming data sources
Easy development in GUI – no need to write Scala, C or Java code
Spark 2.0!
Simplify Streaming Data Integration
Globalization Enhancements
15Syncsort Confidential and Proprietary - do not copy or distribute
Improved Fujitsu NetCOBOL support
Localization
Support for multi-byte copybooks
Complete support of ALL ICU code pages
– Drop down list in GUI that provides most common code pages at the top
– Remembers most recent code page selection and pre-populates
Simplify Big Data Integration with Syncsort
16Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply
Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.
Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, ClouderaManager, ClouderaNavigator and Sentry.
Comply: Manage
Syncsort Confidential and Proprietary - do not copy or distribute17
Cloudera Manager
–Deploy DMX-h across Cloudera cluster
–Monitor DMX-h jobs
Apache Ambari
–Deploy DMX-h across Hortonworks and other clusters
–Monitor DMX-h jobs
Cloudera Director
–Deploy DMX-h on Cloudera in the Cloud
–Elastically expand and reduce capacity as needed for spikes in workload
Comply: Govern
Syncsort Confidential and Proprietary - do not copy or distribute18
Metadata and data lineage for Hive, Avro and Parquet through HCatalog
Metadata lineage export from DMX/DMX-h
–Simplify audits, analytics dashboards, metrics
– Integrate with enterprise metadata repositories
Cloudera Navigator certified integration
–Extends HCatalog metadata
–HDFS, YARN, Spark and other metadata
–Lineage, tagging
–Business and structural metadata
Apache Atlas lineage integration
–Lineage, tagging
–Audit and track
(Technical preview available now)
Simplify Big Data Integration with Syncsort
19Syncsort Confidential and Proprietary - do not copy or distribute
Access Integrate Comply Simplify
Get best in class data ingestion capabilities for Hadoop. Mainframes, RDBMS, MPP, JSON, Parquet, Avro, ORC, NoSQL, Kafka and more.
Single interface for streaming and batch processes. Single data pipeline for all enterprise data, batch or streaming.
Secure data access, data governance and lineage. Seamless integration with Kerberos, Apache Ranger, Apache Ambari, ClouderaManager, ClouderaNavigator and Sentry.
Design once, deploy anywhere & insulate your organization from rapidly changing eco-system. Future proof your applications for new compute frameworks, on premise or in the cloud.
Simplify: Same Solution – On Premise or In the Cloud
• ETL engine on AWS Marketplace – Update to version 9.x
• Available on EC2, EMR, Google Cloud
• S3 and Redshift connectivity
• First & only leading ETL engine on Docker Hub
• Google Cloud Storage connectivity
20Syncsort Confidential and Proprietary - do not copy or distribute
Big Data + Cloud + Syncsort = Powerful, Flexible, Cost Effective
Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
21
Use existing ETL skills.
No worries abut mappers, reducers, big side, small side, and so on.
Automatic optimization for best performance, load balancing, etc.
No changes or tuning required, even if you change execution frameworks
Future-proof job designs for emerging compute frameworks, e.g. Spark 2.0.
Inte
llige
nt
Exec
uti
on
Lay
er
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
Intelligent Execution - Insulate your people from underlying complexities of Hadoop.
Simplify: Design Once, Deploy Anywhere
22
Inte
llige
nt
Exec
uti
on
Lay
er
One interface to design jobs to run on:
Single Node, Cluster
MapReduce 1, 2.x, Spark, Spark 2.0
Windows, Unix, Linux
On-Premise, Cloud
Batch, Streaming
Integrated Workflow
In a single job, combine any execution location, framework or style.
Ingest data on an edge node, then process on the cluster in a single workflow
Combine MapReduce ETL with Spark data analysis
Run extended tasks and custom functions in framework of your choice
Integrated Workflow
23Syncsort Confidential and Proprietary - do not copy or distribute
ADD CUSTOM FUNCTIONALITY
Extend
24Syncsort Confidential and Proprietary - do not copy or distribute
25Syncsort Confidential and Proprietary - do not copy or distribute
Integrate: Easily Extend DMX / DMX-h with Custom Functions & Extended Tasks
• Enable data scientists to add new functions
• Ability to add custom transformation functions
– Shown in the GUI same as built-in functions
– Available via function pull-down and signature
• Ability to add job extensions to the data flow
• Publish a library in Syncsort github– Rounding Package
– Advanced Math Package
– Multiple Pivot options
26Syncsort Confidential and Proprietary - do not copy or distribute
Integrate: Extend User Base with Data Transformation Language (DTL)
• Metadata driven dynamic creation of DMX-h jobs
• Enables partners and end users to build on and extend DMX
• Human readable script-like interface for developing jobs
• Legacy ETL migrations to DMX
– Ability to import DTL to the DMX Graphical User Interface
– Maintain applications in the GUI
– Export metadata to DTL
WHAT’S NEXT?
Roadmap
27Syncsort Confidential and Proprietary - do not copy or distribute
Access: Keep Legacy and Modern Systems in Sync
Syncsort Confidential and Proprietary - do not copy or distribute
• Capture changes in source database as they happen
• Update target systems automatically
• Capture changes in huge tables without straining network capacity
• Minimize impact to source database performance
28
Delta Change Data Capture
Access: Hive Enhancements
Improvements to Hive support
JDBC connectivity
Support for partitioned tables: ORC, Parquet, AVRO, HDFS
Support for Truncate and Insert
Automatic creation of Hive and other Hcat supported tables
Distributed processing of Hive
Update of Hive statistics
Support for Hive tables with very complex arrays
29Syncsort Confidential and Proprietary - do not copy or distribute
Access: New User Experience for DataFunnel
30Syncsort Confidential and Proprietary - do not copy or distribute
DMX DataFunnel™
Access: New User Experience for DataFunnel
31Syncsort Confidential and Proprietary - do not copy or distribute
DMX DataFunnel™
Syncsort Confidential and Proprietary - do not copy or distribute32
THANK YOU!