HowGraphsmakeDatabasesFunagainValueinRela*onships
MichaelHunger(@mesirii)WebinarMar2016
(Michael)-[:HELPS]->(People)-[:WORK_WITH]->(Neo4j)
• Coding• Wri*ng• Speaking• Helping• Connec*ng• Organizing
Topics
• DatabaseswereNoFun• AstoryofPain• WhyshouldIcare?• TheworldisaGraph
• Rela*onalPain=>GraphPleasure• Neo4j:Model,Query,Import• HavingFunintheDeveloperZone• GitHubEvents• SoSwareAnaly*cs• Neo4jfromJava
Topics
• DatabaseswereNoFun• AstoryofPain• WhyshouldIcare?• TheworldisaGraph• Rela*onalPain=>GraphPleasure• Neo4j:Model,Query,Import
DatabaseswereNoFun
Wehaveallbeenthere
Whatpainedme
• Objectvs.DatabaseModel=Pain• HardtoModel• Objectrela*onalimpedancemismatch(andORMs)
• Schemaevolu*on=DBAFights• Slowqueries=JOINPain• ComplexQueries=PagesofSQL• QueryOp*miza*on=Denormaliza*on• Roundtrips(n+1select,complexopera*ons)=Cumbersome
Whatsavedme?–MeeRngEmil
GeekCruisein2008
Neo4j:AStoryofPain
WithaHappyEnding
HistoryofNeo4j-Problem
• DigitalAssetManagementSystemin2000• SaaSmanyusersinmanycountries• Twoharduse-cases• Mul*languagekeywordsearch• Includingsynonyms/wordhierarchies
• AccessManagementtoAssetsforSaaSScale• Groups,Hierarchies,Permissions,Real*me
HistoryofNeo4j–RelaRonalAVempt
• Triedwithmanyrela*onalDBs• JOINPerformanceProblems• Hierarchies,Networks,Graphs
• ModelingProblems• DataModelevolu*on
• NoSuccess,even…• Withexpensivedatabaseconsultants!
HistoryofNeo4j–FirstworkingImplementaRon
• GraphModel&APIsketchedonanapkin• NodesconnectedbyRela*onships• Justlikeyourconceptualmodel
• Implementednetwork-databaseinmemory• JavaAPI,fastTraversals• Workedwell,but…• Nopersistence,NoTransac*ons• Longimport/export*mefromrela*onalstorage
HistoryofNeo4j-SoluRon
• EvolvedtofullfledgeddatabaseinJava• Withpersistenceusingfiles+memorymapping• Transac*onswithTransac*onLog(WAL)• Luceneforfasten*tylookup
• FoundedCompanyin2007• Neo4j(REST)-Server• Neo4jClustering&HA • CypherQueryLanguage
• Today…
NeoTechnologyOverview
Product• Neo4j-World’sleadinggraphdatabase
• 1M+downloads,adding70k+permonth
• 150+enterprisesubscrip*oncustomersincludingover50oftheGlobal2000
Company• NeoTechnology,CreatorofNeo4j• 140+employeeswithHQinSiliconValley,London,Munich,ParisandMalmö
• $45MinfundingfromFidelity,Sunstone,Conor,Creandum,DawnCapital
What,Who,Where,How?FinancialServices Communications Health &
Life Sciences HR &
Recruiting Media &
Publishing SocialWeb
Industry & Logistics
Entertainment Consumer Retail Information Services Business Services
hop://neo4j.com/use-cases hop://neo4j.com/customers
WhyshouldIcare?
BecauseRelaRonshipsMaVer
WhatisitwithRelaRonships?
• Worldisfullofconnectedpeople,events,things• Thereis“ValueinRela*onships”!• WhataboutDataRela*onships?• Howdoyoustoreyourobjectmodel?• HowdoyouexplainJOINtablestoyourboss?
Neo4j–allowsyoutoconnectthedots
• Wasbuilttoefficiently• store,• queryand• managehighlyconnecteddata
• Transac*onal,ACID• Real-*meOLTP• Opensource• Highlyscalablealreadyonfewmachines
ValuefromDataRelaRonshipsCommonUseCases
InternalApplicaRonsMasterDataManagement
NetworkandITOpera*ons
FraudDetec*on
Customer-FacingApplicaRonsReal-TimeRecommenda*ons
Graph-BasedSearchIden*tyand
AccessManagement
Neo4jBrowser–Built-inLearning
RDBMStoGraph–FamiliarExamples
Neo4jBrowser–VisualizaRon
DemoMeetupImport
Teaser:Meetup.comImport
• ForaMeetupEvent• ImportAVendees• ForeachAoendee• ImportInterests/Topics• ImportotherMeetupMemberships• ImportCi*es,Countries
• Othergroupsourmembersarein• Top10topics• Topics&Groupsofac:veMember hops://github.com/ikwaoro/meetup2neohop://markhneedham.com/blog?s=meetup
FromRDBMStoNeo4j
RelaRonalPains=GraphPleasure
RelaRonalDBsCan’tHandleRelaRonshipsWell
• Cannotmodelorstoredataandrela*onshipswithoutcomplexity
• Performancedegradeswithnumberandlevelsofrela*onships,anddatabasesize
• QuerycomplexitygrowswithneedforJOINs• Addingnewtypesofdataandrela*onshipsrequiresschemaredesign,increasing*metomarket
…makingtradi*onaldatabasesinappropriatewhendatarela*onshipsarevaluableinreal-Rme
SlowdevelopmentPoorperformanceLowscalabilityHardtomaintain
RelaRonalDBsCan’tHandleRelaRonshipsWell
• DataModelbuiltfortabularformsnotJOINSmanagingconnec*onswasboltedonbothinschemaandquery
• Strictschemanotsuitableforvariablestructureddatawhichisgeneratedandusedbytodaysapplica*ons
• DatavolumeandJOINnumberaffectcostofqueryopera*onexponen*ally
• Variablehierarchiesandnetworksarehardtostoreandquerysomany“paoerns”weredeveloped
…oSenonlydenormaliza*onmakescomplexrela*onalqueriesfastbutdestroysthegoodnormalizeddata-model
BuiltforFormsJoinsareexpensiveDenormalize#FTW
UnlockingValuefromYourDataRelaRonships
• Modelyourdatanaturallyasagraphofdataandrela*onships
• Drivegraphmodelfromdomainanduse-cases
• Userela*onshipinforma*oninreal-*metotransformyourbusiness
• Addnewrela*onshipsontheflytoadapttoyourchangingrequirements
HighQueryPerformancewithaNaRveGraphDB
• Rela*onshipsarefirstclassci*zen• Noneedforjoins,justfollowpre-materializedrela*onshipsofnodes
• Query&Data-locality–navigateoutfromyourstar*ngpoints
• Onlyloadwhat’sneeded• Aggregateandprojectresultsasyougo
• Op*mizeddiskandmemorymodelforgraphs
RelaRonalVersusGraphModels
RelaRonalModel GraphModel
KNOWSANDREAS
TOBIAS
MICA
DELIA
Person FriendPerson-Friend
ANDREASDELIA
TOBIAS
MICA
ForInstance…
…IsActually
Order
Product
Customer Employee
SOLD
ORDERS
Category
Employee
REPORTS_TO
PART_OF
PURCHASED
Supplier
SUPPLIES
MATCH(boss)-[:MANAGES*0..3]->(mgr),(mgr)-[:MANAGES*1..3]->(report)WHEREboss.name=“JohnDoe”RETURNmgr.nameASSubordinate,count(report)ASTotal
ExpressComplexQueriesEasilywithCypher
Findalldirectreportsandhowmanypeopletheymanage,eachupto3levelsdown
CypherQuery
SQLQuery
HighQueryPerformance:SomeNumbers
• Traverse2-4M+rela*onshipspersecondandcore
• Costbasedqueryop*mizer–complexqueriesreturninmilliseconds
• Import100K-1Mrecordspersecondtransac*onally
• Bulkimporttensofbillionsofrecordsinafewhours
WorkingwithaGraph
Model,Import,Query
TheWhiteboardModelIsthePhysicalModel
EliminatesGraph-to-RelaRonalMappingInyourdatamodelBridgethegap
betweenbusinessandITmodels
InyourapplicaRonGreatlyreduceneedforapplica*oncode
CAR
name:“Dan”born:May29,1970twioer:“@dan”
name:“Ann”born:Dec5,1975
since:Jan10,2011
brand:“Volvo”model:“V70”
PropertyGraphModelComponents
Nodes• Theobjectsinthegraph• Canhavename-valueproper:es• CanbelabeledRelaRonships• Relatenodesbytypeanddirec*on• Canhavename-valueproper:es
LOVES
LOVES
LIVESWITHPERSON PERSON
Cypher:PowerfulandExpressiveQueryLanguage
MATCH(:Person{name:“Dan”})-[:LOVES]->(:Person{name:“Ann”})
LOVES
Dan Ann
LABEL PROPERTY
NODE NODE
LABEL PROPERTY
GedngDataintoNeo4j
Cypher-Based“LOADCSV”Capability• Transac*onal(ACID)writes• Ini*alandincrementalloadsofupto10millionnodesandrela*onships
Command-LineBulkLoaderneo4j-import• Forini*aldatabasepopula*on• Forloadsupto10B+records• Upto1Mrecordspersecond
4.58millionthingsandtheirrela*onships…
Loadsin100seconds!
CSV
QueryingYourData
BasicPaVern:TomHanksMovies?
MATCH(:Person{name:”TomHanks"})-[:ACTED_IN]->(:Movie{*tle:”ForrestGump"})
ACTED_IN
TomHanks
ForrestGump
LABEL PROPERTY
NODE NODE
ForrestGump
LABEL PROPERTY
BasicQuery:TomHanks‘Movies?
MATCH (actor:Person)-[:ACTED_IN]->(m:Movie)
WHERE actor.name = "Tom Hanks"
RETURN *
BasicQuery:TomHanks‘Movies?
QueryComparison:ColleaguesofTomHanks?
SELECT *FROM Person as actor JOIN ActorMovie AS am1 ON (actor.id = am1.actor_id) JOIN ActorMovie AS am2 ON (am1.movie_id = am2.movie_id) JOIN Person AS coll ON (coll.id = am2.actor_id)WHERE actor.name = "Tom Hanks“
MATCH (actor:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(coll:Person)WHERE actor.name = "Tom Hanks"RETURN *
BasicQueryComparison:ColleaguesofTomHanks?
Mostprolificactorsandtheirfilmography?
MATCH (p:Person)-[:ACTED_IN]->(m:Movie)
RETURN p.name, count(*), collect(m.title) as movies
ORDER BY count(*) desc, p.name asc
LIMIT 10;
Mostprolificactorsandtheirfilmography?
Neo4jQueryPlanner
CostbasedQueryPlannersinceNeo4j2.2• Usesdatabasestatstoselectbestplan• CurrentlyforReadOpera*ons• QueryPlanVisualizer,finds• Nonop*malqueries• CartesianProduct• MissingIndexes,GlobalScans• Typos• MassiveFan-Out
QueryPlanner
Slightchange,addan:Personlabel->morestatsavailable->newplanwithfewerdatabase-hits
DemoSoQwareAnaly*cs
jqassistant.org
SoiwareAnalyRcs
SoSwareisconnectedinforma*on,akaagraph• Source->AST• Inheritance,Composi*on,Delega*on• CallTrees• Run*meMemory• Dependencies• Modules,Libraries
• Tests• ... hops://jqassistant.org
jQAssistant
• GeekCruise:MyfirstNeo4jproject• Soiwaredeteriorates• Developrulesandenforcethem• CommercialToolstooinflexible• OpenSourceSoSware...• Scanner->Enhancer->Analyzer
• Enrichment,ConceptsandRulezinCypher• ScannerPlugins• IntegrateinBuildProcess• Fail,GenerateReports,... hops://jqassistant.org
Let‘sexplore...
hops://jqassistant.org
...TheJDK
DemoGitHubEvents
github.com/ikwaWro
Demo:GitHubEvents
• GitHub:SocialCoding• GH-Events:• Fork,Comment,PR,...• Archive+API
• Applica*onforImport:• GitHub->PHP->RabbitMQ->Neo4j
• Data:• 1MUsers,1.4MRepos,62kOrgs• 13MEvents
hops://github.com/ikwaoro/github2cypherhops://github.com/ikwaoro/github-event
Model:GitHubEvents(parRal)
Event:WatchaRepository
MATCH (w:WatchEvent)WITH w LIMIT 1MATCH p = (w)-[:EVENT_TIME]->(:Minute)<-[:CHILD*]-(:Year), (w)-[:EVENT_ACTOR]->(u:User)-->(r:Repository) <-[:WATCHED_REPOSITORY]-(w)RETURN *;hops://twioer.com/ikwaoro/status/618431227100532737
UsingNeo4jfromJavaChoicesGalore!
Neo4j.com/developer/java
Neo4j:OSS,MadeinJava
• Java7/8,Scala• HighPerformanceIO,• MemoryMapping,• Collec*ons,Caches,Cursors,• Remo*ng,• Paxos,RaS,...• Libraries• Jeoy,Lucene,Neoy,Parboiled,...
hops://github.com/neo4j/neo4j
Using:Neo4jfromJava
• Neo4jJavaAPIs• ServerExtensions• Embedded
• (parallel)Batch-ImportAPIs• REST/HTTP• JDBCDriver• Neo4j-OGM• SpringDataNeo4j• Upcoming:BinaryProtocolDriver
hops://neo4j.com/developer/java
GetuptospeedwithNeo4jQuicklyandEasily
UsersLoveNeo4j
ThereAreLotsofWaystoEasilyLearnNeo4j
neo4j.com/developer
SummaryIntroducRonNeo4j
Neo4jAllowsYou…• Keepyourrichdatamodel• Handlerela*onshipsefficiently• Writequerieseasily• Developapplica*onsquickly
For.NetDevelopers• Neo4jInstaller• DriversforNeo4jfrom.Net• HostDatabaseonAzure• DeployAppstoAzure
ThankYou!AskQues*ons,orTweet
@neo4j|hWp://neo4j.com@mesirii|MichaelHunger