36
Schema-less table & Dynamic Schema Davide Mauri [email protected] @mauridb

Schema less table & dynamic schema

Embed Size (px)

Citation preview

Page 1: Schema less table & dynamic schema

Schema-less table & Dynamic SchemaDavide [email protected]@mauridb

Page 2: Schema less table & dynamic schema

Davide Mauri

• Microsoft SQL Server MVP

• Works with SQL Server from 6.5, on BI from 2003

• Specialized in Data Solution Architecture, Database Design, Performance Tuning, High-Performance Data Warehousing, BI, Big Data

• President of UGISS (Italian SQL Server UG)

• Regular Speaker @ SQL Server events

• Consulting & Training, Mentor @ SolidQ

• E-mail: [email protected]

• Twitter: @mauridb

• Blog: http://sqlblog.com/blogs/davide_mauri/default.aspx

Page 3: Schema less table & dynamic schema

Agenda

• Schema, Schemaless & Implicit Schemas

• Possible solutions

• Conclusion

Page 4: Schema less table & dynamic schema

Schema

• “A priori” definition of data structures

• Allows data to be inserted if and only if it is compatible with the schema

• Es: RDBMS Table, XML Schema, Class, Struct

Page 5: Schema less table & dynamic schema

Schemaless (?)

• No definition at all on the data you expect to have. • Unstructured data.

• For example: text files, binary files• with no metadata and no position-based format

• In one word: chaos

Page 6: Schema less table & dynamic schema

Implicit Schema

• In reality a schema always exists, albeit implicit• Otherwise it would be impossible to handle data

Page 7: Schema less table & dynamic schema

Implicit Schema

Any data that doesn't fit this implicit schema will not be manipulated properly, leading to errors.

(Schemaless data structures, Martin Fowler)

Page 8: Schema less table & dynamic schema

Pros

• Flexibility• Easy to manage

• actually, almost no management at all

• Easy to be extended• Just add a new element and you’re done

• Easy to be used• No mismatch between OOP and other models

Page 9: Schema less table & dynamic schema

Cons

• Schema information are hidden somewhere• Scattered all across the codebase

• It’s really difficult to keep under control the chaos that can emerge• For example two different element that contains the same information

• CustomerName and Customer_Name

• You still need to have a sort of «First Normal Form» in order to avoid inconsistency and code inefficiencies

Page 10: Schema less table & dynamic schema

Cons

• It’s really difficult to define and maintain integrity constraints • Data Integrity is a value that must be preserved!

• Otherwise we’ll have data, not information

• XML Schema were born for that specific reason

• Without Data Integrity, the process of extracting information from data becomes• Difficult

• Expensive

• Untrustable

Page 11: Schema less table & dynamic schema

Words of Wisdom

«Schemaless => implicit schema = bad.

Prefer an explicit schema»(Schemaless data structures, Martin Fowler)

Page 12: Schema less table & dynamic schema

But if we need it anyway?

• What if my use case is one that perfectly fits the need for a implicit schema?

• The only possible solution are the so-called «No-SQL» databases• Document Database or Key-Value store?

• How can I integrate it into already existing database?

• Integration does not come for free!

Page 13: Schema less table & dynamic schema

Schemaless & RDBMS

• (Usually) Are the exact opposite extremes

• Still is a very common request• CRM, eCommerce, ERPs….

• Schemaless is used not only for pure data persistence

Page 14: Schema less table & dynamic schema

Solution within an RDBMS

• «Custom» columns• Custom1, Custom2

• In-Table Data Structures• BLOB, XML, JSON, «Complex» columns

• Entity-Attribute-Value Models

Page 15: Schema less table & dynamic schema

«Custom» Columns

• A problem until SQL Server 2008• Space is still used for fixed length column even if they contain a NULL

value

• With SQL Server 2008 the «Sparse Column» feature comes to help• Helps to make the schema easily modifiable, even in presence of

existing data

• Changes to the schema must still be done with «ALTER TABLE»

Page 16: Schema less table & dynamic schema

«Custom» Columns

• Sparse Columns• Are Columns at 100%

• Optionally you can have *all* the Sparse Columns returned as a single XML column• «Column Set»

• Make development easier

• Do not take space if not used • But use more space when used

Page 17: Schema less table & dynamic schema

DemoDynamic Schema & Sparse Columns

Page 18: Schema less table & dynamic schema

In-Table Data Structures

• Complete support for XML• XPath/XQuery

• XML Index

• Performance «Good Enough»• But not optimal (compared with the equivalent relational approach)

• Use a lot of space

Page 19: Schema less table & dynamic schema

In-Table Data Structures

• XML Sometimes needs some help to boost performance

• Would be nice to be able to «promote» elements to turn them into real columns• Must be done manually using a choice of

• Triggers

• Stored Procedure

• Data Access Layer

• Service Broker

Page 20: Schema less table & dynamic schema

In-Table Data Structures

• JSON support is still missing in SQL Server• But others database like PostgreSQL already have it…• …so we can see it coming to MS Platform too

• Right now one solution is to use SQLCLR• Solutions available surfing the web:• http://www.sqlservercentral.com/articles/SQLCLR/74160/• http://www.json4sql.com/examples.html

• There is also a pure T-SQL solution• https://www.simple-talk.com/sql/t-sql-programming/consuming-json-strings-

in-sql-server/

Page 21: Schema less table & dynamic schema

In-Table Data Structures

• Blob is an option if you just need to do persistence

• Blob can be stored in different way• «Classic» blob inside SQL Server pages & extents

• Blob in a filestream

• Blob in a filetable

Page 22: Schema less table & dynamic schema

DemoDynamic Schema & In-Table Data Structures

Page 23: Schema less table & dynamic schema

Entity-Attribute-Values

• Old and very common technique to store attribute-value pairs• Some well-known samples: Wordpress

• Works on any RDBMS• No «special» features required

• There’s a huge debate around it • http://en.wikipedia.org/wiki/Entity%E2%80%93attribute%E2%80%93value_mo

del

• But until SQL 2005 no true alternative

Page 24: Schema less table & dynamic schema

Entity-Attribute-Values

Page 25: Schema less table & dynamic schema

Entity-Attribute-Values

• Offers maximum flexibility• No real control over data types.

• Options to deal with data types• All strings• SQL Variant• One-Column-Per-Type

• Complex query pattern for «AND» predicates between attributes• «Return all the entities that have «CPU=i7» and «Display=15.4’»

Page 26: Schema less table & dynamic schema

Entity-Attribute-Values

• Queries requires the implementation of a relational operator not implemented in common RDMBS• «Relational Division»

• Document and well explained in theory• It is quite easy to implement it. Follow theory + add some pepper to boost

performances

Page 27: Schema less table & dynamic schema

Relational Division

• Let’s get back to theory a little bit, in order to see the problem from a more open perspective:

Dividend

Divisor Result

Remainder

𝛼

𝛽

Page 28: Schema less table & dynamic schema

Relational Division

• How do we implement the division?

• Thanks to Codd and the relational theory we already have the solution

28

Page 29: Schema less table & dynamic schema

Relational Division

• Thanks to relational algebra we know that the division is expressed as

• Generate all possibile pairings

• Remove existing pairing• (Now we’ve found all pairings that are NOT answers)

• Remove the non-answers from the dividend

29

Page 30: Schema less table & dynamic schema

DemoDynamic Schema & EAV

Page 31: Schema less table & dynamic schema

Conclusions

• It works! • Performance more than good

• Choose the solution that better fits your use-case• Search for attributes only?

• Persistence only?

• Search for attributes & values?

• Performance read, write, read/write?

Page 32: Schema less table & dynamic schema

Conclusions

• Use it if and only if when really needed

• Always remeber the «Words of Wisdom» • If you can define and use a schema.

• It may seem «not cool» and convoluted but in the long term is the best solution.• *data* *must* *be* *turned* *into* *information*

• Sooner or later

• Without metadata (a schema) it’s really really really hard!

Page 33: Schema less table & dynamic schema

Questions?

Page 34: Schema less table & dynamic schema

Thanks!

• If you want to rate this session on my SpeakerScore page:

• www.speakerscore.com

• Feedback Key: TZQL

Page 35: Schema less table & dynamic schema

Demo Material

• Can be found here• http://1drv.ms/1Av5mb5

• Everything is release under the Creative Common Attribution-NonCommercial-ShareAlike 4.0 International (CC BY-NC-SA 4.0) • http://creativecommons.org/licenses/by-nc-sa/4.0/

Page 36: Schema less table & dynamic schema