Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
1
Wednesday, October 6, 2010
2Image credit: http://browsertoolkit.com/fault-tolerance.png
Wednesday, October 6, 2010
3Image credit: http://browsertoolkit.com/fault-tolerance.png
Wednesday, October 6, 2010
4Image credit: http://browsertoolkit.com/fault-tolerance.png
Wednesday, October 6, 2010
NOSQL- an overview -goto; con 2010
Emil EifremCEO, Neo Technology
Wednesday, October 6, 2010
So what’s the plan?๏Why NOSQL?
๏The NOSQL landscape
๏NOSQL challenges
๏Conclusion
6
Wednesday, October 6, 2010
First off: the name
7
๏WE ALL HATES IT, M’KAY?
Wednesday, October 6, 2010
NOSQL is NOT...
8
Wednesday, October 6, 2010
NOSQL is NOT...
๏ NO to SQL
8
Wednesday, October 6, 2010
NOSQL is NOT...
๏ NO to SQL
๏ NEVER SQL
8
Wednesday, October 6, 2010
Not Only SQL
9
NOSQL is simply
Wednesday, October 6, 2010
Wednesday, October 6, 2010
11
Four trends
NOSQL - Why now?
Wednesday, October 6, 2010
Trend 1:data set size
Source: IDC 2007200740
Wednesday, October 6, 2010
200740
2010
988
Source: IDC 2007
Trend 1:data set size
Wednesday, October 6, 2010
Trend 2: Connectedness
141990
Info
rmat
ion
conn
ectiv
ity
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Wednesday, October 6, 2010
Trend 2: Connectedness
15
Text documents
1990
Info
rmat
ion
conn
ectiv
ity
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Wednesday, October 6, 2010
Trend 2: Connectedness
16
Text documents
1990
Info
rmat
ion
conn
ectiv
ity
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Wednesday, October 6, 2010
Trend 2: Connectedness
17
Text documents
1990
Info
rmat
ion
conn
ectiv
ity Folksonomies
Tagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Wednesday, October 6, 2010
Trend 2: Connectedness
18
Text documents
1990
Info
rmat
ion
conn
ectiv
ity Folksonomies
Tagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Ontologies
RDF
GiantGlobal
Graph (GGG)
Wednesday, October 6, 2010
Trend 2: Connectedness
19
Text documents
1990
Info
rmat
ion
conn
ectiv
ity Folksonomies
Tagging
User-generated content
Wikis
RSS
Blogs
Hypertext
2000 2010 2020
web 1.0 web 2.0 “web 3.0”
Ontologies
RDF
GiantGlobal
Graph (GGG)
Wednesday, October 6, 2010
Trend 3: Semi-structure
20
๏ Individualization of content
• In the salary lists of the 1970s, all elements had exactly one job
• In the salary lists of the 2000s, we need 5 job columns! Or 8? Or 15?
๏All encompassing “entire world views”
• Store more data about each entity
๏Trend accelerated by the decentralization of content generation that is the hallmark of the age of participation (“web 2.0”)
Wednesday, October 6, 2010
Aside: RDBMS performance
21Data complexity
Perfo
rman
ce
Relational database
Requirement of application
Wednesday, October 6, 2010
Aside: RDBMS performance
22Data complexity
Perfo
rman
ce
Relational database
Requirement of application
Wednesday, October 6, 2010
Aside: RDBMS performance
23Data complexity
Perfo
rman
ce
Salary List Relational database
Requirement of application
Wednesday, October 6, 2010
Aside: RDBMS performance
24Data complexity
Perfo
rman
ce
Majority ofWebapps
Salary List Relational database
Requirement of application
Wednesday, October 6, 2010
Aside: RDBMS performance
25Data complexity
Perfo
rman
ce
Majority ofWebapps
Social network
Semantic Trading
Salary List
}custom
Relational database
Requirement of application
Wednesday, October 6, 2010
Trend 4: Architecture
26
DB
Application
1980s: Application (<-- note lack of s)
Wednesday, October 6, 2010
Trend 4: Architecture
27
DB
Application
1990s: Database as integration hub
Application Application
Wednesday, October 6, 2010
DBDB DB
Trend 4: Architecture
28
Service
2000s: (moving towards) Decoupled serviceswith their own backend
Service Service
Wednesday, October 6, 2010
Why NOSQL Now?
๏Trend 1: Size
๏Trend 2: Connectedness
๏Trend 3: Semi-structure
๏Trend 4: Architecture
29
Wednesday, October 6, 2010
Four NOSQL categories
30
Wednesday, October 6, 2010
Category 1: Key-Value stores
31
๏Lineage:
• “Dynamo: Amazon’s Highly Available Key-Value Store” (2007)
๏Data model:
•Global key-value mapping
•Think: Globally available HashMap/Dict/etc
๏Examples:
• Project Voldemort
•Tokyo {Cabinet, Tyrant, etc}
Wednesday, October 6, 2010
Category 1: Key-Value stores
32
๏Strengths
• Simple data model
•Great at scaling out horizontally
๏Weaknesses:
• Simplistic data model
• Poor for complex data
Wednesday, October 6, 2010
Category II: ColumnFamily (BigTable) stores
33
๏Lineage:
• “Bigtable: A Distributed Storage System for Structured Data” (2006)
๏Data model:
•A big table, with column families
๏Examples:
•HBase
•HyperTable
•CassandraWednesday, October 6, 2010
Category III: Document databases
34
๏Lineage:
• Lotus Notes
๏Data model:
•Collections of documents
•A document is a key-value collection
๏Examples:
•CouchDB
•MongoDB
Wednesday, October 6, 2010
Document db: An example
35
๏How would we model a blogging software?
๏One stab:
•Represent each Blog as a Collection of Post documents
•Represent Comments as nested documents in the Post documents
Wednesday, October 6, 2010
Document db: Creating a blog post
36
import com.mongodb.Mongo;import com.mongodb.DB;import com.mongodb.DBCollection;import com.mongodb.BasicDBObject;import com.mongodb.DBObject;// ...Mongo mongo = new Mongo( "localhost" ); // Connect to MongoDB// ...DB blogs = mongo.getDB( "blogs" ); // Access the blogs databaseDBCollection myBlog = blogs.getCollection( "Thobe’s blog" );
DBObject blogPost = new BasicDBObject();blogPost.put( "title", "JAOO^H^H^H^HGoto; con 2010" );blogPost.put( "pub_date", new Date() );blogPost.put( "body", "Publishing a post about JA...Goto con in
my MongoDB blog!" );blogPost.put( "tags", Arrays.asList( "conference", "names" ) );blogPost.put( "comments", new ArrayList() );
myBlog.insert( blogPost );
Wednesday, October 6, 2010
Retrieving posts// ...import com.mongodb.DBCursor;// ...
public Object getAllPosts( String blogName ) {DBCollection blog = db.getCollection( blogName );return renderPosts( blog.find() );
}
private Object renderPosts( DBCursor cursor ) {// order by publication date (descending)cursor = cursor.sort( new BasicDBObject( "pub_date", -1 ) );// ...
}
37
Wednesday, October 6, 2010
Category IV: Graph databases
38
๏Lineage:
• Euler and graph theory
๏Data model:
•Nodes with properties
•Typed relationships with properties
๏Examples:
• Sones GraphDB
• InfiniteGraph
•Neo4jWednesday, October 6, 2010
Property Graph model
39
Wednesday, October 6, 2010
Property Graph model
40
LIVES WITHLOVES
OWNSDRIVES
LOVES
Wednesday, October 6, 2010
Property Graph model
41
LIVES WITHLOVES
OWNSDRIVES
LOVESname: “James”age: 32twitter: “@spam”
name: “Mary”age: 35
brand: “Volvo”model: “V70”
property type: “car”
Wednesday, October 6, 2010
Graphs are whiteboard friendly
42Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
Wednesday, October 6, 2010
Graphs are whiteboard friendly
43
thobe
Wardrobe Strength
Joe project blog
Hello Joe
Neo4j performance analysis
Modularizing Jython
Image credits: Tobias Ivarsson
An application domain model outlined on a whiteboard or piece of paper would be translated to an ER-diagram, then normalized to fit a Relational Database.With a Graph Database the model from the whiteboard is implemented directly.
Wednesday, October 6, 2010
Graph db: Creating a social graph
44
GraphDatabaseService graphDb = new EmbeddedGraphDatabase(GRAPH_STORAGE_LOCATION );
Transaction tx = graphDb.beginTx();try {
Node mrAnderson = graphDb.createNode();mrAnderson.setProperty( "name", "Thomas Anderson" );mrAnderson.setProperty( "age", 29 );
Node morpheus = graphDb.createNode();morpheus.setProperty( "name", "Morpheus" );morpheus.setProperty( "rank", "Captain" );
Relationship friendship = mrAnderson.createRelationshipTo(morpheus, SocialGraphTypes.FRIENDSHIP );
tx.success();} finally {
tx.finish();}
Wednesday, October 6, 2010
Graph db: How do I know this person?Node me = ...Node you = ...
PathFinder shortestPathFinder = GraphAlgoFactory.shortestPath(Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ),/* maximum depth: */ 4 );
Path shortestPath = shortestPathFinder.findSinglePath(me, you);
for ( Node friend : shortestPath.nodes() ) {System.out.println( friend.getProperty( "name" ) );
}
45
Wednesday, October 6, 2010
Graph db: Recommend new friendsNode person = ...
TraversalDescription friendsOfFriends = Traversal.description().expand( Traversals.expanderForTypes(
SocialGraphTypes.FRIENDSHIP, Direction.BOTH ) ).prune( Traversal.pruneAfterDepth( 2 ) ).breadthFirst() // Visit my friends before their friends.//Visit a node at most once (don’t recommend direct friends).uniqueness( Uniqueness.NODE_GLOBAL ).filter( new Predicate<Path>() {
// Only return friends of friendspublic boolean accept( Path traversalPos ) {
return traversalPos.length() == 2;}
} );
for ( Node recommendation : friendsOfFriends.traverse( person ).nodes() ) {System.out.println( recommendedFriend.getProperty("name") );
} 46
Wednesday, October 6, 2010
Four emerging NOSQL categories
๏Key-Value stores
๏ColumnFamiy stores
๏Document databases
๏Graph databases
47
Wednesday, October 6, 2010
Scaling to size vs. Scaling to complexity
48
Size
Complexity
Key/Value stores
Wednesday, October 6, 2010
Scaling to size vs. Scaling to complexity
48
Size
Complexity
Key/Value stores
ColumnFamily stores
Wednesday, October 6, 2010
Scaling to size vs. Scaling to complexity
48
Size
Complexity
Key/Value stores
ColumnFamily stores
Document databases
Wednesday, October 6, 2010
Scaling to size vs. Scaling to complexity
48
Size
Complexity
Key/Value stores
ColumnFamily stores
Document databases
Graph databases
Wednesday, October 6, 2010
Scaling to size vs. Scaling to complexity
48
Size
Complexity
Key/Value stores
ColumnFamily stores
Document databases
Graph databases
My subjective view: > 90% of use cases
Billions of nodesand relationships
Wednesday, October 6, 2010
NOSQL challenges?
49
Wednesday, October 6, 2010
NOSQL challenges?๏Mindshare
•But that’s also product usability (“how do you query it?”)
49
Wednesday, October 6, 2010
NOSQL challenges?๏Mindshare
•But that’s also product usability (“how do you query it?”)
๏Tool support
•Both devtime tools and runtime ops tools
• Standards may help?
• ... or maybe just time
49
Wednesday, October 6, 2010
NOSQL challenges?๏Mindshare
•But that’s also product usability (“how do you query it?”)
๏Tool support
•Both devtime tools and runtime ops tools
• Standards may help?
• ... or maybe just time
๏Middleware support
49
Wednesday, October 6, 2010
Middleware support?๏Let me tell you the story about Mike
50
Wednesday, October 6, 2010
Step 1: Buildsing a web site
51
MySQL
PHP n stuff
One box
Wednesday, October 6, 2010
Step II: Whoa, ppl are actually using it?
52
Wednesday, October 6, 2010
Step II: Whoa, ppl are actually using it?
52
MySQL
PHP n stuff
Two boxes
Wednesday, October 6, 2010
Step III: That’s a LOT of pages served...
53
MySQL
PHP n stuff n boxesPHP n stuff PHP n stuff
1 box
Wednesday, October 6, 2010
Step IV: Our DB is completely overwhelmed...
54
MySQL (m)
PHP n stuff n boxesPHP n stuff PHP n stuff
MySQL (s) n boxes
Wednesday, October 6, 2010
Step V: Our DBs are STILL overwhelmed
?55
Wednesday, October 6, 2010
Step V: Our DBs are STILL overwhelmed๏Turns out the problem is due to joins
๏A while back we introduced a new feature
•Recommend restaurants based on the user’s friends (and friends of friends)
• It’s killing us with joins
๏What about sharding?
๏What about SSDs?
56
Wednesday, October 6, 2010
Polyglot persistence (Not Only SQL)๏Data sets are increasingly less uniform
๏Parts of Mike’s data fits well in an RDBMS
๏But parts of it is graph-shaped
• If fits much better in a graph database like Neo4j!
๏But what does the code look like?
57
Wednesday, October 6, 2010
An intervention!There shall be code.
58
Wednesday, October 6, 2010
Conclusion๏There’s an explosion of ‘nosql’ databases out there
• Some are immature and experimental
• Some are coming out of years of battle-hardened production
๏NOSQL is about finding the right tool for the job
• Frequently that’s an RDBMS
•But increasingly commonly an RDBMS is the perfect fit
๏We will have heterogenous data backends in the future
•Now the rest of the stack needs to step up and help developers cope with that 59
Wednesday, October 6, 2010
Not Only SQL
60
Key takeaway
Wednesday, October 6, 2010
http://neotechnology.com
Wednesday, October 6, 2010