I've Always Wanted To Data Model - Data Week 2013

Preview:

DESCRIPTION

One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.

Citation preview

I’ve Always Wanted To Data Model

Ian Varley, Salesforce.comData Week, 2013-10-02

Lightning Talk (10 minutes)

Who am I?Ian VarleyAustin, TX

Salesforce.comBig Data Team@thefutureian

What’s Data Modeling?

The act of taking the intelligible structure of the world around us, and

making it concrete enough for computers to act on it.

(More specifically, data modeling usually has to do with storing it in a database.)

Traditionally, data modeling has meant Entity Attribute Relationship

modeling techniques.

There are variants that are more “OO” (like UML) but they share most of the same core assumptions.

Many a project was sunk due to shitty data modeling.

It’s a difficult occupation.You have to be part engineer, part psychologist, and part philosopher.

But.

The expressive power of our conceptual modeling techniques hasn’t

improved much since the 1970s.

We mostly look at the world in the same static way we did 40 years ago.

Partly, this is because our discipline is wedded to relational (SQL) DBs.

When the only tool you have is a hammer ...

A book that opened my eyes ...

(He said a lot of the stuff I’m about to say back in 1978!)

I don’t have a lot of answers.But I want to raise some questions.

And hopefully, start a conversation.

Here are 5 observations about the tools of traditional data modeling.

#1: nobody actually knows what an “entity” really is.

“Entity” is another word for Category, in linguistics terms.

And an important property of linguistic categories is that they are slippery.

See:● Steven Pinker: The Stuff Of Thought● Douglas Hofstadter: Surfaces & Essences● George Lakoff: Women, Fire, and Dangerous Things

part: an abstract definition of a connected set of physical materials that serve some purpose, and that people are willing to buy

part: one instance of a part type, which arrives on the QA line at a specific time and either does or doesn't meet quality standards

And if you think you can “solve” the problem, I’ve got some world trade

center insurance policies to sell you.

That said, there are a couple tools we could adopt that would help:

● First-class Sub- / Super-Typing● First-class Scoping and Aliasing

(Not that there aren’t ways to do this in ERD models, but they’re unobvious and not widely used.)

#2: entities, attributes, and relationships are really the

same thing, maaaan ...

http://the-hippie-portfolio.tumblr.com/

Say I’ve got a “parent” in my model.

Is it:● A “parent” entity?● A “person” entity with

an “isParent” attribute?● Two “person” entities in

a “parent” relationship?

It’s all of them; the distinction is arbitrary.

The real structure is just a graph … but none of our modeling tools are that flexible, nor is it helpful to think that

abstractly about most software.

Normally, we make the choice based on our experience and gut feeling, and

pretend there’s a science to it.

But the whole way of thinking is a convenience based on “records”.

I have no idea what to do about this.

Tools that allow you to view any part of your model in any of those ways?

I have no idea what to do about this.

Tools that allow you to view any part of your model in any of those ways?

I have no idea what to do about this.

Tools that allow you to view any part of your model in any of those ways?

This isn’t realistic with today’s tools, so this is just idle speculation.

#3: prescriptive models encourage black & white thinking in a gray world

You have to make decisions (about entities, attributes, relationships, types) up front. But sometimes that’s not right.

This is a strength of (some) NoSQL databases: you can do data first, and

surface structure later.

Sometimes the deep structure is actually ambiguous.

This can apply broadly.(What if an employee isn’t really “in” a department, but has

flexible membership based on where she spends her time?)

You can represent that in a traditional data model, sure.

But you’re not encouraged to.

#4: static models make the time dimension unwieldy

Entity models are generally silent on the ways data changes.

Many modern databases can keep older versions of objects.

But should they? For which entities How many versions? etc.

Worse, what about when the model changes at runtime, and you need to also retain knowledge of what the old

model was?

As in #3, there are ways to model this in entity models, but it’s not easy, so most people just don’t think about it.

#5: boxes & lines aren’t how we actually think

Our spatial processing of diagrams doesn’t map well to our temporal,

spatial, and causal comprehension of data structure.

What do people really do?

Skip making models when their models look too complicated.

F*** THAT NOISE.

Is there an alternative? Not yet.

What could move the needle?● Prototype based modeling● Proper scoping● Semantic zooming

The map is not the territory.

In conclusion … if you dig this stuff, let’s talk!

@thefutureian