Upload
isheeta-sanghi
View
1.411
Download
1
Embed Size (px)
Citation preview
Beyond MessagingEnterprise Dataflow powered by Apache NiFi
© Hortonworks Inc. 2011 – 2015. All Rights Reserved
Aldrin Piri19 January 2016
Page 2 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
About me
Senior Member of Technical Staff
Project Management Committee and Committer
@aldrinpiri
Page 3 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Simplistic View of Enterprise Data Flow
The Data Flow Thing
Process and Analyze DataAcquire Data
Store Data
Page 4 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Global interactions with customers, business partners, and thingsspanning different volume, velocity, bandwidth, and latency needs
Realistic View of Data Flow
Page 5 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Meeting Edge Requirements
GATHER
DELIVER
PRIORITIZE
Track from the edge Through to the datacenter
Small Footprintsoperate with very little power
Limited Bandwidthcan create high latency
Data Availabilityexceeds transmission bandwidth
Data Must Be Securedthroughout its journey
Page 6 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Remote sensor delivery (Internet of Things - IoT)
• Intra-site / Inter-site / global distribution (Enterprise)
• Ingest for driving analytics (Big Data)
• Data Processing (Simple Event Processing)
Where do we find data flow?
Page 7 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Basics of Connecting Systems
For every connection, these must agree:1. Protocol2. Format3. Schema4. Priority5. Size of event6. Frequency of event7. Authorization access8. Relevance
P1
Producer
C1
Consumer
Page 8 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Messaging addresses only a small subset of the problem space
• Needed to understand the big picture
• Needed the ability to make immediate changes
• Must maintain chain of custody for data
• Rigorous security and compliance requirements
Challenges of dataflow in the enterprise
Page 9 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Great options including: • Kafka• ActiveMQ• Tibco
Let us consider the perfect messaging system for this talk:• It has zero latency• It has perfect data durability• It supports unlimited consumers and producers
Messaging Systems as Dataflow
Page 10 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
“But my system needs…”
• A different format and/or schema
• To use a different protocol
• The highest priority information first
• Large objects (event batches) / Small Objects (streams)
• Authorization to the data level
• Only interested in a subset of data on a topic
• Data needs to be enriched/sanitized before it arrives
Dataflow as a messaging problem
Page 11 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Using Messaging
Only a subset agree using messaging1. Protocol2. Format3. Schema4. Priority5. Size of event6. Frequency of event7. Authorization access8. Relevance
P1
CN
C1
Messaging
More issues to consider:• How do you know what the data flow looks like? • How is it managed?• How is it working – today, yesterday?
Page 12 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
• Add new systems to handle the protocol differences
• Add new systems to convert the data
• Add new systems to reorder the data
• Add new systems to filter the unauthorized data
• Add new topics to represent ‘stages of the flow’
Which leads to latency, complexity, and limited retention
Ultimately, the operations teams who handle data at flow boundaries become responsible for managing.
How these issues are typically solved
Page 13 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Real-time Data Flow
It’s not just how quickly you move data – it’s about how quickly you can change behavior and seize new opportunities
Page 14 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Introducing Apache NiFi
• Guaranteed delivery• Data buffering
- Backpressure- Pressure release
• Prioritized queuing• Flow specific QoS
- Latency vs. throughput- Loss tolerance
• Data provenance
• Recovery/recording a rolling log of fine-grained history
• Visual command and control
• Flow templates• Pluggable/multi-role
security• Designed for extension• Clustering
Page 15 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
November 2014NiFi is donated to the Apache Software Foundation (ASF) through NSA’s Technology Transfer Program and enters ASF’s incubator.
2006NiagaraFiles (NiFi) was first incepted by Joe Witt at the National Security Agency (NSA)
A Brief History
July 2015NiFi reaches ASF top-level project status
Page 16 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Flow Based Programming (FBP)
FBP Term NiFi Term DescriptionInformation Packet
FlowFile Each object moving through the system.
Black Box FlowFile Processor
Performs the work, doing some combination of data routing, transformation, or mediation between systems.
Bounded Buffer
Connection The linkage between processors, acting as queues and allowing various processes to interact at differing rates.
Scheduler Flow Controller
Maintains the knowledge of how processes are connected, and manages the threads and allocations thereof which all processes use.
Subnet Process Group
A set of processes and their connections, which can receive and send data via ports. A process group allows creation of entirely new component simply by composition of its components.
Page 17 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
Architecture
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
OS/Host
JVM
NiFi Cluster Manager – Request Replicator
Web Server
MasterNiFi Cluster Manager (NCM)
OS/Host
JVM
Flow Controller
Web Server
Processor 1 Extension N
FlowFileRepository
ContentRepository
ProvenanceRepository
Local Storage
SlavesNiFi Nodes
Page 19 © Hortonworks Inc. 2011 – 2015. All Rights Reserved
Learn more and join us!
Apache NiFi sitehttp://nifi.apache.org
Subscribe to and collaborate [email protected]@nifi.apache.org
Submit Ideas or Issueshttps://issues.apache.org/jira/browse/NIFI
Follow us on Twitter@apachenifi