Upload
jbrendel
View
2.072
Download
1
Tags:
Embed Size (px)
DESCRIPTION
The open source SnapLogic data integration framework. Overview, examples, screenshots.
Citation preview
Data Integration with Server Side Mashups
Juergen BrendelPrincipal Software Engineer
OSDC 2007, Brisbane
Slide 2
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Agenda
• The SnapLogic project• Client-side mashups• Problems and solutions• Data integration with SnapLogic
Slide 3
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
The SnapLogic project
• Founded 2005, data integration background• Vision:
– Reusable data integration resources– REST– Web-based GUI– Programmatic interface– Open Source
• Python... Why not?• www.snaplogic.com
Slide 4
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
What's a mashup?
• A 'Web 2.0 kind of thing'• Combine, aggregate, visualise
– Multiple sources– Multiple dimensions
• Typically on the client side– Browser– Ajax
Slide 5
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Self-made mashups
• Hand coded• Mashup editors
– GUI mashup-logic editor– Wiki-style– Hosted
Slide 6
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Benefits for the enterprise?
Yeah, right...
Enable knowledgeEnable knowledgeworkers !!!workers !!! Situat
ionalSituat
ional
applicatio
ns !
applicatio
ns !
Avoid theAvoid theIT bottleneck !!
IT bottleneck !!
Slide 7
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Problems with client-side mashups
• Skill• Internal data often not web-friendly• Maintenance• Security• Performance
Slide 8
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Solution: Server-side mashups
• Flexible access• Security• Performance
Slide 9
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
SnapLogic data integration philosophy
• Clearly defined, REST resources• Data reuse and integration• Pipelines• Framework for resource specific scripting• Open source and community
Slide 10
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Example: Resources
SnapLogic Server
ComponentHTTP
Resource Definition
Databases
Files
Applications
Atom / RSS
HTTP://server1.example.com/customer_list
Client HTTP Request and Response
• Resource Name• HTTP://server1.example.com/customer_list • SQL Query or filename • Credentials• Parameters
JSON
Slide 11
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Example: Pipelines
SnapLogic Server
Component HTTP
Resource Definition
HTTP://server1.example.com/processed_customer_list
Client HTTP Request and Response
Component
Resource Definition
Component
Resource Definition
Read Geocode Sort
Databases
Files
Applications
Atom / RSS
JSON
Slide 12
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
A simple pipeline: Filtering leads
Slide 13
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Linking fields in a pipeline
Slide 14
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Reusing a pipeline as a resource
Slide 15
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Reusing a pipeline as a resource
Slide 16
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Reusing a pipeline as a resource
Slide 17
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Adding new components
• For access logic• For data transformations• Independent of data format• Currently written in Python
Slide 18
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
A simple processing component
1: class IncreaseSalary(DataComponent):2: 3: def init(self):4: '''Called when the component is started.'''5: self.increase = float(self.moduleProperties['percent_increase'])6: 7: def processRecord(self, record):8: '''Called for every record.'''9: record.fields['salary'] *= (1 + self.increase/100)10: self.writeRecord(record)
Slide 19
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
An Apache log file reader1: class LogReader(DataComponent):2: 3: def startReading(self):4: '''Called when component does not have input stream.'''5: logfile = open(self._filename, 'rbU')6: format = self.moduleProperties['log_format']7: 8: if format == 'COMMON':9: p = apachelog.parser(apachelog.formats['common'])10: elif ...11: 12: # Read all lines in the logfile13: for line in logile:14: out_rec = Record(self.getSingleOutputView())15: raw_rec = p.parse(line)16: out_rec.fields['remote_host'] = raw_rec['%h']17: out_rec.fields['client_id'] = raw_rec['%l']18: out_rec.fields['user'] = raw_rec['%u']19: out_rec.fields['server_status'] = int(raw_rec['%>s'])20: out_rec.fields['bytes'] = int(raw_rec['%b'])21: ...22: 23: self.writeRecord(out_rec)
Slide 20
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Programmatic access
• GUI is nice, but still limiting• SnapScript: An API library• Python, PHP, more to come
Slide 21
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Creating a resource
1: # Create a new resource2: staff_res_def = Resource(component='SnapLogic.Components.CsvRead')3: staff_res_def.props.URI = '/SnapLogic/Resources/Staff'4: staff_res_def.props.description = 'Read the from the employee file'5: staff_res_def.props.title = 'Staff'6: staff_res_def.props.delimiter = '$?{DELIMITER}'7: staff_res_def.props.filename = '$?{INPUTFILE}'8: staff_res_def.props.parameters = (9: ('INPUTFILE', Param.Required, ''),10: ('DELIMITER', Param.Optional, ',')11: )12: 13: # Define the output view of the resource14: staff_res_def.props.outputview.output1 = (15: ('Last_Name', 'string', 'Employee last name'),16: ('First_Name', 'string', 'Employee first Name'),17: ('Salary', 'number', 'Annual income')18: )
Slide 22
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Creating a pipeline
1: # Create a new pipeline2: p = Pipeline()3: p.props.URI = '/SnapLogic/Pipelines/empl_salary_inc'4: p.props.title = 'Employee_Salary_Increase'5: 6: # Select the resources in the pipeline7: p.resources.Staff = staff_res_def.instance()8: p.resources.PayRaise = increase_salary_res_def.instance()9: 10: # Link the resources in the pipeline11: link = (12: ('Last_Name', 'last'),13: ('First_Name', 'first'),14: ('Salary', 'salary')15: )16: p.linkViews('Staff', 'output1', 'Salary_Increaser', 'input1', link)
Slide 23
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
Pipeline parameters
1: # Define the user-visible parameters of the pipeline2: p.props.parameters = (3: ('INCREASE', Param.Required, ''),4: )5: 6: # Map values to the parameters of the pipeline's resources7: p.props.parammap = (8: (Param.Parameter, 'INCREASE', 'PayRaise', 'PERC_INCREASE'),9: (Param.Constant, 'file://foo/staff.csv', 'Staff', 'INPUTFILE')10: )11: 12: # Confirm correctness and publish as a new resource13: p.check()14: p.saveToServer(connection)
Slide 24
Data Integration with Server Side Mashups
OSDC 2007, Brisbane
The end
Any questions?