SpringXDCDD

Spring XD CDD.docx

Spring XD Course Development Document

Spring XD Course Development Document (CDD)Draft

1. Course Description (all or some of the following may be included) This comprehensive 3 day course will provide students with the skills needed to leverage Spring XD for data ingestion in a Big Data environment.

The hands on training covers installation and administration of Spring XD; usage of the Spring XD Shell; creating, configuring, deploying, and scaling streams and jobs; as well as the development of custom modules including sinks, sources and jobs. Students will learn how to configure the product for various deployment scenarios, including high availability, distributed mode, and deployment to YARN.

Concepts presented will be reinforced with lab exercises. 2. Course Objectives etc3. Delivery Initially Instructor Led Training, but there are possibilities here for Web Based Self Service Training via Pivotal Academy.

4. Format Open Office slides and Note Pages Tar and README file for lab exercises Product documentation

5. Audience Using Spring XD as Data Ingestion Tool for PHD/Hadoop and as runtime for existing Spring Integration and Spring Batch projects.

6. Course Length 3 days. Maybe later with an optional 4th day that includes a deep dive how to develop own modules with SI and SB.

7. Prerequisites Basic Java Development experience. Spring and Hadoop knowledge is not required

8. Development Timeline First version ready for a beta delivery is planned for December 31st 2014.

9. Topics

General Introduction to Data Ingestion What is Data Ingestion, why do we need it? What tools exists besides Spring XD? (Flume, Sqoop) Why we need a Batch system as well? Spring XD Overview

Install and Startup in Singlenode mode Simple singlenode installation Setting up a simple stream (e.g. Http -> File) Creating a simple job (e.g. filejdbc) Architecture Overview (Streams, Jobs, Taps) Running in Distributed mode, which needs to include these submodules: Spring XD DIRT (Admin, Container, ZooKeeper, RDBMS, Transport, Analytics Repo) Database setup for Spring Batch Apache ZooKeeper intro and setup Setting up a message broker for transport (Reddis, Rabbit MQ) Hadoop setup and quick Hadoop introduction (comparison with tools like Flume, Sqoop, Oozie?) Running DIRT on Hadoop YARN HA and FT (either via DSL or UI) Demonstrate cluster view to visualize deployed containers and its modules; kill containers; evaluate how modules get re-deployed automatically Container state can be queried either via ZK CLI or REST or as well as Admin UI cluster view tab Introduction to Streams Source, Processor, Sink Overview with easy examples (time, file, http) Customizing Stream Modules Quick Spring Integration overview and how to customize existing modules and deploy them as new modules Advanced Streams More advanced examples with built in modules (hdfs, script, reactor, twitter, ...) Best Practices (what should be done with a stream, what not)

Introduction to Jobs Create and launch a simple built in job (filepollhdfs, launch with Admin UI and absoluteFilePath param) Customizing Job Modules Quick Spring Batch overview and how to customize existing modules and deploy them as own modules Advanced Jobs Connect a job to a stream (filepollhdfs again) Best Practices (what should be done with a job, what not)

Management and Monitoring Spring XD Extending Spring XD Pluggable and decoupled architecture (transport bus or analytics store or module extensions etc.) Security in Spring XD

10. Lab Exercises

Most of the labs will run natively on the student machine. This way, we can show the easy setup of Spring XD and also have a better performance and lesser requirements compared to VM based labs. Labs using Hadoop probably need to run in a VM. We will limit the number of labs using Hadoop therefore to a minimum. This way, we will only will have 2-3 labs running in a VM (HDFS source/sink, Spring Batch with HDFS, Map Reduce)

11. Appendix

Spring XD and Lambda Architecture I have slides and documentation to back this reference architecture Spring XD in comparison with Data Ingest and Stream Analytics products such as Storm, Spark, Kafka, Oozie, Flume, Sqoop etc. This could be a point of reference for students/enterprises to see how Spring XD can complement, replace, or compete with other products, thus showcasing how Spring XD can simplify Big Data development and deployment Spring XD on PCF Introduction (very high-level) to PCF Initiative is at the early stages; WIP to develop Spring XDs MVP on PCF Spring XD Enterprise Use Cases Ive documentation on several enterprise use cases that can be shared to context set on real-world examples Product overview Specifications, APIs, example applications, technical documentation etc. Product Benchmarks Spring XDs QoS benchmarks and statistics when compared with competing products - initiative at its early stages

Pivotal Enablement and Education3/31/2014

Documents

SpringXDCDD