Upload
yifeng-jiang
View
2.247
Download
0
Embed Size (px)
Citation preview
Introducing Apache Nifi
Yifeng Jiang Solutions Engineer, Hortonworks
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
About Me
Yifeng Jiang• Solutions Engineer, Hortonworks• Apache HBase book author• I like hiking• Twitter: @uprush
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Agenda
• Introduction to Nifi • Nifi Demo • Nifi Use Case
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introduction to Apache NiFi
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Nifi Overview
Nifi is an easy to use, powerful, and reliable system to process and distribute data.
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Terminology FlowFile
• Unit of data moving through the system • Content + Attributes (key/value pairs)
Processor • Performs the work, can access FlowFiles
Connection • Links between processors • Queues that can be dynamically prioritized
Process Group • Set of processors and their connections • Receive data via input ports, send data via output ports
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - User Interface
• Drag and drop processors to build a flow • Start, stop, and configure components in real time • View errors and corresponding error messages • View statistics and health of data flow • Create templates of common processor & connections
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Provenance
• Tracks data at each point as it flows through the system
• Records, indexes, and makes events available for display
• Handles fan-in/fan-out, i.e. merging and splitting data
• View attributes and content at given points in time
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Queue Prioritization
• Configure a prioritizer per connection
• Determine what is important for your data – time based, arrival order, importance of a data set
• Funnel many connections down to a single connection to prioritize across data sets
• Develop your own prioritizer if needed
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi - Architecture
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Nifi Cluster
• Nifi Cluster Manager • Nifi Cluster Nodes
• Primary Node
• Isolated Processor OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile Repository
Content Repository
Provenance Repository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile Repository
Content Repository
Provenance Repository
Local Storage
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
Master NiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFile Repository
Content Repository
Provenance Repository
Local Storage
Slaves NiFi Nodes
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Demo
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Nifi Demo • The demo cluster: ambari deployment, NCM
• Real-time indexing in Solr & Banana
• Nifi UI • Flow statistics • Data provenance, event details, replay
• Add a Processor to push data to Kafka
• Nifi data on the node • Flow file repository • Content repository
• Provenance repository
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site to Site -- Flow
Nifi Cluster A (source)
Nifi Cluster B (destination)
Site to site
Remote Process Group
Flow file attributes transferred
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Site to Site – Data Provenance
Nifi Cluster A (source)
Nifi Cluster B (destination)
Event details at cluster B
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
NiFi Use Cases
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Cases – Index JSON 1. Pull in Tweets using Twitter API
2. Extract language and text into FlowFile attributes
3. Get non-empty English tweets ${twitter.text:isEmpty():not():and(
${twitter.lang:equals("en")})}
4. Merge together JSON documents based on quantity, or time
5. Use dynamic field mappings to select fields for indexing:
Page 18 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Use Cases – Index a Relational Database 1. GenerateFlowFile acts a timer to trigger
ExecuteSQL (Future plans to not require in an incoming FlowFile to ExecuteSQL NIFI-932)
2. ExecuteSQL performs a SQL query and streams the results as an Avro datafile Use expression language to construct a dynamic date range:
${now():toNumber():minus(60000)
:format(‘YYYY-MM-DD’}
3. Convert Avro to JSON using built in ConvertAvroToJSON processor
4. Stream JSON update to Solr
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Built-in Processors
• 90 built-in processors • Well-defined API
• Easy to implement
Page 20 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Page 21 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Thank You