Upload
thor-henning-hetland
View
220
Download
0
Embed Size (px)
DESCRIPTION
I'll do a talk on how we've used Neo4J for dataquality analysis & corrections as well as breed-analysis and more at NKK, where performance (dogs/second 100-9.000) and (queries/second 200-20.000) are important metrics.. :)
Citation preview
Neo4Dogs
Innovation
Intelligent Systems
Software Engineering
Graph Cafe, Teknologihuset, Oslo, 27.06.2014
Totto-14
@javatotto / [email protected]
A Global Leader
AMERICAS
EUROPE
ASIA
Bringing our customers' projects to life and boosting their performance through technology and innovation
«
« € +1 633 mREVENUES in 2013
+20 000EMPLOYEES in 2013
+20COUNTRIES
R&D and InnovationFor 30 years, Altran has had a close relationship with innovation.
Where creative ideas become a reality, Altran consultants step up to transform ideas into innovative solutions that can enable technological progress.
In this way, Altran has contributed to major technological advances in recent decades: speed, precision, security, communication, practicality, interoperability, artificial intelligence...
AEGT: the world's mostpowerful electric carAltran was responsible for designing and engineering the electric transmission on this car, capable of reaching speeds of 300 km/h.
Solar Impulse: the first plane to fly on solar energy aloneSince 2003, Altran experts have dedicated their skills to bringing about this formidable technical and human achievement.
The Airport of the Future: outlining a ‘friend-lean’ space in 2040Altran develops revolutionary concepts for airports responding to long-term changes in the industry.
Agenda
● Situation analysis
● From dog register via case management to dog-hub
● The platform
● Performance and some metrics
Initial analysis
● From register to case management – over 20 years of legacy..
– Dog information spread across 30+ relational tables
– 2-3 weeks of work to retrieve «a dog» with some info (every time)
– «impossible» to store new types of data/information on a dog
– Data was hidden/unavailable to people -> «data rot»
– Cascading costs of change and new features
● Recognized the need for a different approach
– But how to get out of the squeeze was not obvious..
– Limited technical skills, system knowledge and functional knowledge
– No time, capacity or money to do a «full rewrite»
● We selected a bottom up, data first, platform aproach. With strong capabilities for continous data quality processes and strong support for semi-structured data.
From dog register to case management to dog hub
● Quick and easy access to individual dogs
● Scale - 10 to 50 integrations with other systems (hub)
● Handling individual dogs of «questionable» data quality
● Easily extendable to store more data on any individual
– Semi structured strategy for persistence/storage
Top level architecture
The platform we built
● Dog search & lookup
– SolrCloud with "json_full"
● DogPopulationService
– Pedigree, population structure, breeddata
– Data error, data deviation, data missing -> DogFixer
● DogIDMapper (multi-source, multi-master, map different ID-schemes)
● DogCrawler
– Is it possible to find aditional data to fix this individual?
● DogFixer
– Is it possible to statistically find the right answer?
– Manual process in some corner-cases / difficult cases
● DogServiceREST
– verify & merge, writeback updates
– «tailing» datasources of dog information
Some numbers
● 2 mill reqs/hour
● 10 mill reqs/24 hours
● Breed calculations went form taking «months» to «instant»
– 200-500 joins per individual, 1000/year, 10 years = 2-8 sek
● Latency: 0.2 sek, 99.7% of reqs
● DogIDMapper: 4000 dogs/sec
● DogGraph: 3000 dogs/sec
● DogFixer: 10-15 dogs/sec
● DogCrawler: 100-200 dogs/sec
Handle huge spikes
And survive «issues» with low latency
Try it out:
* http://dogsearch.nkk.no* http://dogpopulation.nkk.no/* http://dogpopulation.nkk.no/ras/?breed=Dunker* http://dogpopulation.nkk.no/dogpopulation/concurrent/executor/status
* Code: by request :)