Upload
joao-rocha-da-silva
View
690
Download
4
Tags:
Embed Size (px)
DESCRIPTION
Invited Lecture on NoSQL databases and modern web-development frameworks. JavaScript + JSON = easy parsing, less verbose code NodeJS = asynchronous everything. Needs precise flow control ElasticSearch = Scalable indexing, easy to use JSON API GridFS = Transparent scaling for huge numbers of large files; querying using JSON-based API Graph Databases = Model certain problems better than their • relational counterparts. Simpler queries using SPARQL. Less mature than RDBMs. No transactions. Socket.io = Real-time library for client-server-client push communication
Citation preview
Web frameworks and
graph databasesOverview and code demos
João Rocha da Silva
May 2014
Contents• Modeling limits of relational databases
• Entities with variable attributes
• Time-variant values
• Inheritance
• Hierarchies (parents of parents of parents…)
Contents (cont’d)• Modeling problems in a graph
• Ontologies and SPARQL
• OpenLink Virtuoso
• Scalable file storage: GridFS within MongoDB
• Scalable document indexing : ElasticSearch
• NodeJS and asynchronous flow control
• AngularJS for dynamic web interfaces
• BONUS : Socket.io sneak peek
Contents (cont’d)
Relational databases • Good when you know everything about the
problem at the time of modeling
• A column can only be of a single type (VARCHAR, int, etc)
• Hard to document
• Model can become too attached to the code
Relational databases
• Handling historical values = complex SQL
• Hierarchies = Foreign Key loops
• Variable attributes, inheritance = [null + if Hell] or many JOINs
Relational models
(one of 78,826 tables and counting)
source : SAP
Beautiful, meaningful column names ;-)
Even better table names
!source MediaWiki
“Old Versions” aka “copy everything and add a timestamp”
!source MediaWiki
now imagine we want to images of different kinds, with different attributes…
Attribute name
Timestamps
Value (always varchar)
Entity with variable, time-dependent
attributes
Fixed attrs.
!source CKAN
Graph models
Graph databases • Represent entities (Users, Products, Places…) as
vertexes (entity types are called classes)
• Connections between them are directed graph edges (edge types are called properties)
!
• The meaning of these connections is expressed in ontologies that can be shared and reused
Representing a person using ontologies
http://www.fe.up.pt/~pro11004
“João Rocha”
foaf:name
up:PhDStudent rdf:type
http://www.w3.org/TR/rdf-schema/http://www.foaf-project.org/
http://www.fe.up.pt/
org:memberOf
Getting all the studentsSELECT ?uri ?attribute ?value FROM <http://myorganization.com/data> WHERE { ?uri rdfs:type up:Student. ?uri ?attribute ?value }
• Will fetch all the students, regardless of their type
• Will also return their attributes (“database columns”)
• Different types of students will have different attributes
Inference
• Transitive Properties (subclass of subclass…) • Subclasses • Multiple Inheritance Handling
(Student + Researcher + ScholarshipHolder)
Saves coding time spent writing complex queries
Nothing comes for free• Aggregation operators slow
• Transactions are not supported in standard SPARQL
• (“SPARQL 1.1 Query/Update Services should be atomic but that they are not required to be atomic.”)
• Graph DBMS Solutions are in early stages (many bugs, many “beta”s, many mailing lists…)
An example application
Dendro (dendro-dev.fe.up.pt:3001)
• Dropbox and File/Folder description platform
• Variable descriptions
• Time-dependent values
• Directory structures (hierarchy)
• Need for simple querying…
nie:isLogicalPartOf
Pn
Dn
280mm
“DCB Base Data”
120
Dn-1
dcb:initialCrackLength
dc:title
dcb:specimenWidth
dc:isReferencedBy
Fn
120
dc:title
dcb:specimenWidth
dc:isVersionOf
Added propertyinstance
01/01/2014^^xsd:date
dc:created
01/01/2014^^xsd:date
dc:modified
Changedmodificationtimestamp
Revision creation
timestamp
Un
dc:creator
Current dataset version Past Revisions
ddr:pertainsTo
Change recording
C
ddr:initialCrackLen
gth
ddr:changedDescriptor
“add”
ddr:operation
“DCB Base Data”
Socket.io Real-time eventsNodeJSBusiness
Logic
AngularJS
Dynamic interfaces à la Google Docs
Files
GridFS
Database
OpenLink Virtuoso
Free-text search
ElasticSearch
Code DemosNodeJS (Dendro) http://192.168.5.75:3001
GridFS http://192.168.5.75:27017
OpenLink Virtuoso http://192.168.5.75:8890
ElasticSearch http://192.168.5.75:9200/_plugin/head/
Socket.io (BattleBits) http://localhost:3000
Conclusions• JavaScript + JSON = easy parsing, less verbose code
• NodeJS = asynchronous everything. Needs precise flow control
• ElasticSearch = Scalable indexing, easy to use JSON API
• GridFS = Transparent scaling for huge numbers of large files; querying using JSON-based API
• Graph Databases = Model certain problems better than their relational counterparts. Simpler queries using SPARQL. Less mature than RDBMs. No transactions.
• Socket.io = Real-time library for client-server-client push communication
João Rocha da Silva is an Informatics Engineering PhD student at the Faculty of Engineering of the University of Porto. He specializes on research data management, applying the latest Semantic Web Technologies to the adequate preservation and discovery of research data assets. !He is experienced in many programming languages (Javascript-Node, PHP with MVC frameworks, Ruby on Rails, J2EE, etc etc) running on the major operating systems (everyday Mac user). Regardless of language, he is a quick learner that can adapt to any new technology quickly and effectively. !He is also an experienced freelancer iOS Developer with several Apps published on the App Store, and a self-taught DIY mechanic with a special interest in classic cars, particularly his 1987 Toyota Corolla GT Twin Cam, also known as Hachi-Roku or AE86.
!Research Data Management and Semantic Web Researcher, Web & iPhone Developer
João Rocha da Silva!