Upload
trankhue
View
225
Download
2
Embed Size (px)
Citation preview
2
Scott Stewart
Biography
About
ETL Architect for BNY Mellon NEXEN Analytics
Over 5 Years Pentaho Experience
Over 10 years of data integration and ETL experience
Google, NetApp, Sony-Ericcson, Cisco
3
External Forces are Creating Digital Disruption
Non-traditional
Competitors
Decreasing
Technology
Barrier to Entry
Enabling
Productivity
and Efficiency
Security and
Risk Resiliency
Millennial
Consumer
Behavior
Agile
Environments
Big Data
Insights and
Analytics
Focus on Client
Experience
Investment in
Innovation
Modern
Consumer
Behavior
Increasing
Regulatory Change
Global Political
Turmoil
Low Global
Economic Growth
Low Interest
Rates
Changing Investor
Demands
Increasing Cyber
Security Threats
Evolution of
Marketplace
Lending
Transparency in
Financial Services
4
Servicing Multiple Needs Through Common Components
Professionals Investors Developers Employees Traders Machines
Legacy Solutions
Non
BXP
Private Cloud Public Cloud
BXP
Electronic (APIs)Access
Services
Data
Solutions
Digital Pulse
Foundational
ServicesBusiness Services Workflows
Third-Party
Solutions
Browser / Mobile
Infrastructure
5
Servicing Multiple Needs Through Common Components
Professionals Investors Developers Employees Traders Machines
Legacy Solutions
Non
BXP
Infr
as
tru
ctu
re
Private Cloud Public Cloud
BXP
Electronic (APIs)
Data
Solutions
Digital Pulse
Foundational
ServicesBusiness Services Workflows Third-Party Solutions
Browser / Mobile
Access
Services
BNY Mellon NEXENSM Analytics is a service that
consumes and is consumed by several services of
the NEXEN Digital Platform• BXP DigitalCloud based platform
• Pulse integration
• API Gateway integration
6
Success
Improve
Client
Experience
Reduce
Cost of
Ownership
Integration
to BNY
Technology
Reduce
Microsoft
Developed
Technology
Big Data
Integration
Achieving Success with NEXEN Analytics
7
Why Define Best Practices
Adaptable to Changing
Environment
Improve Maintainability
Improve Quality
Foster Organizational Learning
Foster
Scalability
8
Establish Clear Naming Conventions
Name Files Responsibly
Example
✓Use camel case names
✓Use verb-noun pattern
▪ loadProductData.kjb
▪ extractProductCodes.ktr
Make it simple to follow
Make it describe purpose of each file
Don’t use spaces or slashes
9
Alternate Example
✓Use underscore delimiter
✓Prefix file name with “j_” for job, “t_” for transformation
▪ j_load_product_data.kjb
▪ t_extract_product_codes.ktr
Establish Clear Naming Conventions
Example
✓Use camel case names
✓Use verb-noun pattern
▪ loadProductData.kjb
▪ extractProductCodes.ktr
Make it simple to follow
Make it describe purpose of each file
Don’t use spaces or slashes
Name Files Responsibly
10
Organize Your Project
Use folders to organize files. Don’t put
everything into a single folder
Every folder has a README.md file to
describe purpose and content
Keep data and code in separate trees
Rule of
If greater than 7 objects then
consider breaking into subfolders
Define Project Folder Hierarchy
11
Create a Folder for Each
Individual Pipeline
Designate a Folder for
Shared Code
Organize Your Project
Define Project Folder Hierarchy
12
Categorize Configuration Files by
EnvironmentDistinctly Isolate Data from the
Code
Organize Your Project
Define Project Folder Hierarchy
13
A Little Effort Goes a Long Way
Clearly Name Each Step
What does this do?
Simple names make the intent clearer
14
Clearly Name Each Step
Step Names Should:
Indicate Purpose of
Step
Be in Title Case Except
For Specific Names
Establish Guidelines for
Naming so team understands
what is expected
Step Type Rules Examples
Table
Input/Output
Specify primary table
or nature of join
• Read Product Code Table
• Write prod_info Table
Text Input/Output Specify Filename or
filename pattern
• Read Prod*.csv file
• Write ProductInventory.txt
Filter Values Specify what passes
through filter or
condition
• Pass On Today’s Data
• Is Processed Flag Set?
Select Values nature of columns
filtered or modified
• Clean Temp Columns
• Sync to Merge Stream"
Lookup/Merge indicate lookup/merge
source
• Lookup Up prod_type By
prod_code
• Merge Product Detail with
Product Options
Get/Set Variables
Get System Info
Name variables • Get System Date and IP
Address
• Set $hostname
Do Not Use Step Name Defaults
16
Make “Code” Readable
Standardize Grid Size
Top to Bottom vs. Left to Right
Clearly indicate main stream - visually should
show one straight line either:
▪Straight down
▪An ordered "Z" pattern
Rule of
15 objects on canvas at maximum
• If too many objects for jobs, group things into sub-jobs
• If too many objects for transforms, group things into sub-
transforms (or mappings)
Make It Easy To Follow The Flow Of Data
17
Define a Standard for Making Comments (“Notes”)
Use Comments
Ensure that every job or transformation has a
Header Comment
Additionally, make sure the format of each
comment is standardized
Define how you will structure the content within
the comment
Color Coordinate Your Notes
Complex Logic To-Do’s
18
Database Connection Details File Paths Definitions Built-Ins For Code Location
Don’t Hardcode, Use Variables
22
Don’t Abandon Standard Development Best Practice
Use Code Repository to Track Changes
Foster Frequent Team Communication
Hold Code Reviews
Establish Testing Standards
Bringing It All Together