Upload
others
View
5
Download
0
Embed Size (px)
Citation preview
ProcessingDataofAnySizewithApacheBeam
1/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Mentoring,training,andhigh-levelconsultingcompanyfocusedonBigData,NoSQLandTheCloud
Foundedin2008WehelpmakecompaniessuccessfulwithBigDataprojects
OngoingteammentoringUsecaseevaluationManagementtrainingTechnicaltrainingArchitecturereviewsLiveandemailprogrammingsupport
Gotohttp://www.bigdatainstitute.ioformoreinformation
AboutBigDataInstitute
2/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Yourexperienceasadeveloper,analystoradministrator
Whichlanguagesyouuse
ExperiencewithHadoop,BigDataorNoSQL
Expectationsfromthisclass
AboutYou
3/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Chapter1
IntroducingApacheBeam
4/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
WhatIsBeam?WhyUseBeam?UsingBeam
IntroducingApacheBeam
5/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
ApacheBeamisaunifiedmodelforprocessingdata
WasoriginallycreatedatGoogleLaterdonatedtotheApacheFoundationasApacheBeamNowanApachetoplevelproject
BeamcodeiswrittentoitsAPICodeisexecutedondifferentrunnersNotdirectlytiedtoaframeworkorrunner
Allinteractionsaredonethroughpipelines
ApacheBeam
6/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Pipeline
The DoFNs take in the inputprocesses it, and emit the results
Source
The Source reads the input onerecord or row at a time
DoFN DoFN Sink
The Source saves the output of theDoFN to the targeted path
All work is encapsulated in aPipeline
BeamPipelinesDiagram
7/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Juan
Fatima
Mark
14:00 14:30 15:00 15:30 16:00
Data is broken intosessions based on acriteria for a timeoutbetween actions.
Data can be calculatedin fixed windows wherethe time doesn't change.
Data can be calculatedin sliding windows wherethe time is fixed butadvances.
BeamWindowing
8/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
WhatIsBeam?WhyUseBeam?UsingBeam
IntroducingApacheBeam
9/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Learningframework-specificAPIseverytimeanewframeworkcomesoutorcompletelychangestheirexistingAPIdoesn’tcreate
value
TooManyAPIs
10/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Hadoop Cluster
Real-time data is published toKafka
Spark Streaming, Storm, orKafka Consumers process in
real-time
DataSource
DataSource
DataSource
RDBMS
Real-timeProcessingKafka Cluster
BI Analytics
Batch data is saved to HDFS
DataSource
DataSource
DataSource
MapReduce, Hive, Pig, Crunch,and Spark process data stored
in HDFS
Real-time data is archived toHDFS for analytics and offline
processing
GeneralArchitectureDiagram
11/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
OneAPItorulethemallOneAPItolearnMovebetweenframeworks
ThemostunifiedbatchandstreamAPII’veused
UnifiedAPItotheecosystem
Riskmitigationofframeworks
Multiplelanguages
WhyI'mExcitedAboutBeam
12/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Beamisn'ttiedtoaspecificframework
ApacheSparkusesthespark-submit
ApacheFlinkcanbesubmittedwiththeMavenrunner
GoogleCloudDataflowcanbesubmittedwiththeMavenrunner
TheDirectRunnercanbestartedwiththeMavenrunner
RunningBeam
13/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
BeamContributions
14/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
WhatIsBeam?WhyUseBeam?UsingBeam
IntroducingApacheBeam
15/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
IcannotteachhimTheboyhasnopatience
PCollection<String>etl=lines.apply(MapElements.via((Stringline)->line.toUpperCase()).withOutputType(TypeDescriptors.strings()));
ICANNOTTEACHHIMTHEBOYHASNOPATIENCE
MapElements
16/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Icannotteachhim.Theboyhasnopatience.Hewilllearnpatience.
PCollection<String>linecount=lines.apply(Regex.matches("I.*\\."));
Icannotteachhim.Theboyhasnopatience.
RegularexpressionscanbeusedtoparseKVs
Icannotteachhim.Theboyhasnopatience.Hewilllearnpatience.
PCollection<KV<String,String>>twoSentences=lines.apply(Regex.findKV("(.*)\\.(.*)",1,2));
<Icannotteachhim,Theboyhasnopatience>
RegexTransform
17/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Icannotteachhim.Theboyhasnopatience.Hewilllearnpatience.
PCollection<String>pats=lines.apply(ParDo.of(newPatLinesFN()));
staticclassPatLinesFNextendsDoFn<String,String>{@ProcessElementpublicvoidprocessElement(DoFn<String,String>.ProcessContextcontext)throwsException{String[]pieces=context.element().split("");
for(Stringpiece:pieces){if(piece.startsWith("pat")){context.output(piece);}}}}
patience.patience.
ExampleCustomDoFN
18/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
importorg.apache.beam.sdk.Pipeline;importorg.apache.beam.sdk.io.TextIO;importorg.apache.beam.sdk.options.PipelineOptions;importorg.apache.beam.sdk.options.PipelineOptionsFactory;importorg.apache.beam.sdk.transforms.Count;importorg.apache.beam.sdk.transforms.Regex;importorg.apache.beam.sdk.transforms.ToString;
publicclassPicoWordCount{publicstaticvoidmain(String[]args){PipelineOptionsoptions=PipelineOptionsFactory.create();Pipelinep=Pipeline.create(options);
p.apply(TextIO.Read.from("playing_cards.tsv")).apply(Regex.split("\\W+")).apply(Count.perElement()).apply(ToString.elements()).apply(TextIO.Write.to("output/stringcounts"));
p.run();}}
PlayingCardAlgorithm
19/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
WhatareotherpeopledoingwithBeam?http://tiny.jesse-anderson.com/beaminterview
WhereissomesampleBeamcode?http://tiny.jesse-anderson.com/beamtutorial
MainBeamsitehttps://beam.apache.org/
Convincingyourbosshttp://tiny.jesse-anderson.com/beam1http://tiny.jesse-anderson.com/beam2
NextSteps
20/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf
Current:Instructor,ThoughtLeader,MonkeyTamer
Previously:CurriculumDeveloperandInstructor@ClouderaSeniorSoftwareEngineer@Intuit
Covered,ConferencesandPublishedIn:GigaOM,ArsTecnica,PragmaticProgrammers,Strata,OSCON,WallStreetJournal,CNN,BBC,NPR
SeeMeOn:http://www.jesse-anderson.com@jessetandersonhttp://tiny.bdi.io/linkedinhttp://tiny.bdi.io/youtube
AboutMe
21/21Copyright©2016SmokingHandLLC.AllrightsReserved.Version:bc7f1cf