Upload
embarcadero-technologies
View
383
Download
0
Tags:
Embed Size (px)
Citation preview
Integrating Data fromMultiple Sources
2015-02-26
David Loshin
Knowledge Integrity, Inc.
© 2015 Knowledge Integrity, Inc [email protected] (301) 754-6350 1
Ingesting Data from Multiple Sources
• Continuously streamed data sources may influence business performance analytics:– Influence customer
satisfaction
– Expose opportunities for revenue generation
– Identify brand risk
– Flag fraud and abuse
– Improve customer profiling and customer experience
© 2015 Knowledge Integrity, [email protected](301) 754-6350
2
Challenges
• Entity identifiability
• Limited or no data governance
• Editorial bias
• Absence of metadata
© 2015 Knowledge Integrity, [email protected](301) 754-6350
3
Entity Identifiability
• Recognizing and resolving identities is challenging for static, complete data sets
• Entity identifiability becomes more challenging when merging static and streamed information:– Entity attribute identification
– Entity recognition
– Identity resolution
– Linkage across data sets
© 2015 Knowledge Integrity, [email protected](301) 754-6350
4
Is this the same guy?
Limited or No Data Governance
• Little or no knowledge of– Defined data quality criteria
– Edits or controls
– Chain of accountability
• Limited shared definitions– Typically tabular data dictionaries with nondescript
definitions
• Harvested data has no discernable lineage– Completely devoid of context or production chain
© 2015 Knowledge Integrity, [email protected](301) 754-6350
5
Editorial Bias
• Creating data sets for external consumption involves editorial decisions and biases
• Choices are made about– The physical structure of the data values
– Which data elements are included
– Which are excluded from the final artifact
© 2015 Knowledge Integrity, [email protected](301) 754-6350
6
Selection criteria
Absence of Metadata
• Numerous data sources have little or no metadata at all– Dynamically harvested tabular data
– Scraped data
– Human-generated content
– Automata-generated content
– Unstructured data artifacts
– Other data artifacts (graphics, images, video, audio, etc.)
© 2015 Knowledge Integrity, [email protected](301) 754-6350
7
Example: Healthcare Provider Data
• NPPESProvider First Line Business Mailing Address
• Definition:– “provider’s first line business
mailing address”
• Open PaymentsRecipient_Primary_Business_Street_Address_Line_1
• Definition:– “The first line of the primary
practice/business street address of the physician or teaching hospital (covered recipient) receiving the payment or other transfer of value.”
© 2015 Knowledge Integrity, [email protected](301) 754-6350
8
• Is “provider” the same as “recipient”? • Are these conformant data elements?• Actually it turns out that the Open Payments data element is sourced from
the NPPES data set!
Preparing to Integrate
• Infer the source data sets metadata
• Determine if the data element inventories are structurally conformable
• Determine if the data element inventories are semantically conformable
© 2015 Knowledge Integrity, [email protected](301) 754-6350
9
Inferring Metadata Using Profiling
• Analysis of data sets, records, data elements, and data values to– Infer data element types and sizes
– Identify reference value domains
– Make educated guesses about intent/meaning
© 2015 Knowledge Integrity, [email protected](301) 754-6350
10
Attribute
First d 4 6 y
Last f 6 2 h
Street d 4 7 n
City a 0 2 o
State
Value Count
A 12000
I 10000
L 7655
X 3208
N 120
M 8
Profiling
Conformable Data Elements
• Data elements are conformable if– Share the same data element concept
– Share the same value domain
– Share the same definition and semantics
© 2015 Knowledge Integrity, [email protected](301) 754-6350
11
• These two data elements are conformable if their definitions are the same!
CountryOfOrigin2-character IDO 3166 Country Code
CountryOfManufacture2-character IDO 3166 Country Code
Using Metadata to Test Conformability
• Inferred structural metadata provides the first cut at determining whether two data elements are conformable
• Introduce internal governance and management around external metadata– Use a metadata repository to capture inferred metadata
– Define policies for identification, assessment, documentation, and use of external data sources
– Institute stewardship for each external data source for process management, validation, and maintenance
• Select a metadata tool that provides– Enterprise-wide metadata visibility
– Integration with data assessment tools
– Historical lineage for metadata capture
– Collaboration among data consumers
© 2015 Knowledge Integrity, [email protected](301) 754-6350
12
Questions & Suggestions
• www.knowledge-integrity.com
• www.dataqualitybook.com
• www.decisionworx.com
• If you have questions, comments, or suggestions, please contact me
David Loshin
301-754-6350
© 2015 Knowledge Integrity, [email protected](301) 754-6350
13
EMBARCADERO TECHNOLOGIESEMBARCADERO TECHNOLOGIES
Joy RuffProduct Marketing Manager | ER/[email protected]
ER/Studio Team Server Overview
EMBARCADERO TECHNOLOGIES
Keeping pace with the rapid growth of data, change and compliance
Evolving Database
EcosystemsVolume, Velocity,
Variety
Agile Development
CyclesMaximizing IT
InfrastructureComplianceLimited
Resources
Database Professionals Need the Right Tools
15
EMBARCADERO TECHNOLOGIES
Share Models & Metadata with Business & IT
3
Team Server
ER Repository
Modeling Teams
• Business
Analysts
• Executives
• App and DB Developers
• Data Stewards
• DBAs
EMBARCADERO TECHNOLOGIES
• Powerful enterprise glossary & metadata collaboration
• Integrate key business terms and definitions with business systems
• View, store, and manage a single source of business definitions
• Attach business policies to daily workflows with contextual alerts
and tips
EMBARCADERO TECHNOLOGIES
The Power of Unlimited Involvement
• Use business terms to easily locate and relate information assets
• Maintain enterprise glossaries, terms, and underlying metadata in a central interface
• Enable a consistent flow of information and collaboration around data management
18
Contributors
Business
Architecture
IT
Definition
Structure
Deployment
Synd
ication
Co
llabo
ration
Consumers
Executive
Analyst
Developer
Integration
EMBARCADERO TECHNOLOGIES
Benefit of Relating Metadata to Models
• Expand the depth of information by accessing the underlying framework
19
• Models and terms seamlessly integrate to one another
EMBARCADERO TECHNOLOGIES
The Primary Resource for Data Information
20
• Manage a single source of business definitions in an enterprise glossary
• Avoid the issue of information stagnation
• Improve productivity and accuracy in data analysis, application, BI and ETL development
EMBARCADERO TECHNOLOGIES
Empowering the Organization
23
!
© 2014 Embarcadero Technologies, Inc. Embarcadero, the Embarcadero Technologies logos, and all other Embarcadero Technologies
product or service names are trademarks or registered trademarks of Embarcadero Technologies, Inc. All other trademarks are property of
their respective owners. | 102714
!
!Embarcadero!Technologies!has!been!committed!to!developing!industry7leading!tools!
in!the!database!management!and!architecture!space!for!over!20!years.!!Our!ER/Studio*Team*Server*Core!environment!is!the!next!step!on!that!journey,!offering!modeling!and!metadata!collaboration!and!management.!!Your!IT!and!business!users!gain!visibility!to!existing!data!assets!at!a!deeper!level,!enabling!their!leverage!as!the!critical!decision7making!assets!they!can!and!should!be!–!across!the!enterprise.!!If!you!found!Portal!to!be!useful,!you’re!going!to!love!Team!Server!Core.!
!!
The!added!functionality!of!Team*Server*Core,!including!unlimited!web!user!read/write!access!so!that!all!stakeholders!in!the!company!are!able!to!contribute!to!and!have!access!to!the!critical!models,!metadata,!and!the!enterprise!data!dictionary!(glossary).!!Security!and!data!rights!management!have!also!been!enhanced,!so!you!can!have!complete!confidence!that!your!data!is!protected!and!shared!with!the!right!people!at!the!right!times!and!in!the!right!formats.!
Product(Feature(( Definition((Team(Server(Core(
Portal(
Inline(Definitions(Integrate!enterprise!business!definitions!with!data!management!tools!and!internal!web!assets!into!daily!workflows!
! !
Privacy(and(Security(Alerts(
Adhere!to!industry!regulations!and!business!standards!regarding!security!and!privacy!by!alerting!users!who!view!or!modify!sensitive!data!within!integrated!data!management!tools!
! !
Semantic(Mapping(Develop!applications!and!analyses!faster!by!using!business!terms!to!easily!find!data!elements!
! !
Mapped(Data(Source(Registry(
Generate!information!maps!by!relating!data!models!with!their!data!sources!and!creating!a!single!searchable!registry!of!all!available!data!sources!to!store!information!in!one!place!
! !
Centralized(Reporting(Create!and!share!integrated!reports!using!standard!templates!and!a!reporting!wizard!for!ad!hoc!reports!
! !
Team(Collaboration(Apply!enterprise!collaboration!capabilities!to!capture!and!use!corporate!knowledge!to!reduce!time!identifying!and!correcting!expensive!data!quality!issues!
! !
Model(Sharing(Distribute!and!view!models!across!the!organization,!and!set!permissions!for!visibility!of!objects!
! !
Enterprise(Glossary(View,!classify,!relate!and!centrally!store!authoritative!business!definitions!in!an!extensible!enterprise!glossary!
! !
Custom(Extensions(Enhance!comprehension!of!business!terms!and!data!elements!with!custom!extensions!
! !
Unlimited(Access(to(Metadata(
View,!share,!and!update!the!enterprise!glossaries,!business!terms,!and!custom!attributes,!via!the!web!interface,!for!any!business!or!IT!user!
! !
Limit the level of confusion by centralizing glossaries, terms, and object relationships
• Discuss and add to the development of models and metadata
• Track and gain insight into who and what information has changed in the environment
EMBARCADERO TECHNOLOGIES
The Right Tools are Everything Discover the Benefits of the Ultimate Cross-Platform Database Tools
25
EMBARCADERO TECHNOLOGIES
Thank you!
• Learn more about the ER/Studio product family: http://www.embarcadero.com/data-modeling
• Team Server Hosted Trial: http://www.embarcadero.com/products/er-studio/team-server-hosted-trial
• To arrange a demo, please contact Embarcadero Sales: [email protected], (888) 233-2224
26