View
83
Download
0
Tags:
Embed Size (px)
Citation preview
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 1
E-Commerce and Graph-driven Applications:Experiences and Optimizations while
moving to Linked Data
Andreas Both, Head of Research and DevelopmentUNISTER GmbH, Germany
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 2
Unister Group
e-commerce company
founded 2002
major B2C web portals in Germany (and Europe)
verticals: travel, flights, travel packages, retail, . . .integrated business model10 million unique users per month (Germany, AGOF)
increased number of employees
2003: 12015: 1600
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 2
Unister Group
e-commerce company
founded 2002
major B2C web portals in Germany (and Europe)
verticals: travel, flights, travel packages, retail, . . .integrated business model10 million unique users per month (Germany, AGOF)
increased number of employees
2003: 12015: 1600
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
→ Users: I want it all. I want it now.
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
→ Users: I want it all. I want it now.
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 3
Use Case
Agenda for e-commerce companies:
take advantage of linked data
unchain datastores from schema
Requirements:
fast
robust
scalable
→ Users: I want it all. I want it now.
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 4
Typical Data Structures and Queries
hierarchical (directed) region graph
hotels and regions might have many features
typical queries: select several features of hotels
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 5
Example Query
PREFIX uo : <ht tp : // on to l ogy . u n i s t e r . de/ on to l ogy#>PREFIX uor : <ht tp : // on to l ogy . u n i s t e r . de/ r e s o u r c e/>PREFIX uo r f : <ht tp : // on to l ogy . u n i s t e r . de/ h o t e l / f a c i l i t y />PREFIX uos : <ht tp : // on to l ogy . u n i s t e r . de/ skos/>
SELECT d i s t i n c t ? s {? s a uo : Hote l ;
uo : ha sFea tu r e u o r f : 5 6 ,u o r f : 1 8 ,u o r f : 2 1 ,u o r f : 210 ,u o r f : 5 ,u o r f : 211 ,u o r f : 3 4 ,u o r f : 1 7 ;
uo : l o c a t e d I n uor : Europe ;uo : s u i t a b l e F o r uos : Fami ly
} LIMIT 10 ;
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1), 824 MB, buffer size: 70,000
Experiments: 20 runs, charts show median
1https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1), 824 MB, buffer size: 70,000
Experiments: 20 runs, charts show median
1https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 6
Experiences: standard search process
A search for attributes
...1 very selective
...2 less selective
B pick a region
C sort the results
D limit the selection
Setting:
Dataset: 71600 Hotels, resources: 278,277, literal: 3,022,583
Virtuoso: version 7.1 (fast track1), 824 MB, buffer size: 70,000
Experiments: 20 runs, charts show median
1https://github.com/v7fasttrack/virtuoso-opensource/tree/feature/emergent
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 7
Requirements for Industrial Applicability (in e-commerce)
requirements for replacingtraditional databases:
fast: short response time
search query refinement→ shorter response time
robust: similar answer times
easy to scale up
system resource efficient
→ requirements not fulfilled
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 7
Requirements for Industrial Applicability (in e-commerce)
requirements for replacingtraditional databases:
fast: short response time
search query refinement→ shorter response time
robust: similar answer times
easy to scale up
system resource efficient
→ requirements not fulfilled
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 8
Example Query
PREFIX uo : <ht tp : // on to l ogy . u n i s t e r . de/ on to l ogy#>PREFIX uor : <ht tp : // on to l ogy . u n i s t e r . de/ r e s o u r c e/>PREFIX uo r f : <ht tp : // on to l ogy . u n i s t e r . de/ h o t e l / f a c i l i t y />PREFIX uos : <ht tp : // on to l ogy . u n i s t e r . de/ skos/>
SELECT d i s t i n c t ? s {? s a uo : Hote l ;
uo : ha sFea tu r e uo r f : 5 6 ,uo r f : 1 8 ,uo r f : 2 1 ,uo r f : 2 10 ,uo r f : 5 ,uo r f : 2 11 ,uo r f : 3 4 ,uo r f : 1 7 ;
uo : l o c a t e d I n uor : Europe ;uo : s u i t a b l e F o r uos : Fami ly
} LIMIT 10 ;
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0hotel2 1 0 1 . . . 1hotel3 1 1 1 . . . 0hotel4 1 0 1 . . . 1
......
......
......
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:p =̂ 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0hotel2 1 0 1 . . . 1hotel3 1 1 1 . . . 0hotel4 1 0 1 . . . 1
......
......
......
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:p =̂ 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 9
Data Preparation
hotel entity p1 p2 p3 . . . pn
hotel1 0 0 1 . . . 0hotel2 1 0 1 . . . 1hotel3 1 1 1 . . . 0hotel4 1 0 1 . . . 1
......
......
......
hotelm 0 0 1 . . . 0
BitSet representation of (hotel) properties:p =̂ 0010...0
Advantages:
no index
very small
operations in-memory
easy update
easy insert
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 10
Data Preparation
BitSet Setting, Virtuoso adaptions:
16507 stored properties → 63,109,198 B RAM used
Virtuoso: 824 MB → 706 MB
Virtuoso set-up update: buffer size=60000
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)query IDs from Virtuoso via literal selectionrequires special predicate for each hotel resource
5 return cursor on result set
2 This work has been supported by grants from theEuropean Union’s 7th Framework Programme providedfor the project GeoKnow (GA no. 318159)), c.f.,http://geoknow.eu
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)query IDs from Virtuoso via literal selectionrequires special predicate for each hotel resource
5 return cursor on result set
2 This work has been supported by grants from theEuropean Union’s 7th Framework Programme providedfor the project GeoKnow (GA no. 318159)), c.f.,http://geoknow.eu
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 11
Implemented Process: Virtuoso plugin
(with kind help of the Openlink team, GeoKnow Project2)
1 interpret bif:contains (workaround!)
2 request bitsets from memcache via JNI (workaround!)
3 compute hotels using bit operations on addressed bitsets
4 map hotel IDs to Virtuoso literal IDs (workaround!)query IDs from Virtuoso via literal selectionrequires special predicate for each hotel resource
5 return cursor on result set
2 This work has been supported by grants from theEuropean Union’s 7th Framework Programme providedfor the project GeoKnow (GA no. 318159)), c.f.,http://geoknow.eu
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 12
Preliminary Results of A: properties in BitSets
Observations:
more complex →less response time
stable responsetimes
warmup required
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 13
Preliminary Results of B: non-selective property in Virtuoso
Observations:
less selectivefeature answeredwithin Virtuosohas largest impacton computationtime
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 14
Preliminary Results of C: order by
Observations:
sorting is notdone in BitSet,but might bepossible toimplement in thefuture
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 15
Preliminary Results D: limit 10
Observations:
limit is not donein BitSet, butmight be possibleto implement inthe future
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern isrecognizable, then use bitsetoptimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text indexcluster)
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern isrecognizable, then use bitsetoptimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text indexcluster)
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern isrecognizable, then use bitsetoptimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text indexcluster)
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern isrecognizable, then use bitsetoptimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text indexcluster)
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 16
Discussion
Summary:
proven good performance
query time is robust
very resource efficient
no schema required
→ if a star pattern isrecognizable, then use bitsetoptimization
ToDos (not production ready):
overcome workarounds
tighten the integration
generalize interface
extend to ElasticSearch
→ Virtuoso with full-text indexcluster)
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
→ basic requirements of e-commerce scenario fulfilled→ still flexible (schemaless), but performant
taking advantage of external data structures is possible (inVirtuoso)
Dr. Andreas BothHead of Researchand DevelopmentUnister GmbH,Leipzig, Germany
+49 341 65050 24496
http://www.unister.de
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
→ basic requirements of e-commerce scenario fulfilled→ still flexible (schemaless), but performant
taking advantage of external data structures is possible (inVirtuoso)
Dr. Andreas BothHead of Researchand DevelopmentUnister GmbH,Leipzig, Germany
+49 341 65050 24496
http://www.unister.de
Dr. Andreas Both, Head of R&D, Unister — LDBC, Barcelona, 2015-03-20 Slide 17
Take Away Messages
e-commerce use case requires short and robust request times
BitSet-driven extension has proven its value
→ basic requirements of e-commerce scenario fulfilled→ still flexible (schemaless), but performant
taking advantage of external data structures is possible (inVirtuoso)
Dr. Andreas BothHead of Researchand DevelopmentUnister GmbH,Leipzig, Germany
+49 341 65050 24496
http://www.unister.de