Upload
syncnorwich
View
222
Download
1
Embed Size (px)
DESCRIPTION
Citation preview
• Aviva have a number of brands/channels to market including insurance aggregators (e.g. CompareThe Market, GoCompare…)
• The raw aggregator quote data is of a scale to present a ‘Big Data’ problem – there is great potential for gaining additional insights from this data
So…• Define some candidate business questions• Test them against significant volumes of data• Measure cluster size/£/time performance
Introduction…….
12
The example Aviva Use Case…
Some pig…
Query B: ~10 million quotes (5m each channel). Joining quote data across different channels.
register 's3n://ashaw-1/jars/myudfs.jar';register 's3n://ashaw-1/jars/dom4j-1.6.1.jar';A = load 's3n://ashaw-1/Intermediate/duplicated/lots' using PigStorage();Arac = load 's3n://ashaw-1/Intermediate/duplicated/lotsrac' using PigStorage();A1 = limit A 5000000;Arac1 = limit Arac 5000000;B = foreach A1 generate myudfs.Flatten((chararray)$5);Brac = foreach Arac1 generate myudfs.Flatten2((chararray)$5);C = join B by (chararray)($0.$21), Brac by (chararray)($0.$21);D = filter C by $1.$0 == 1 OR $0.$0 == 1;STORE D INTO ‘s3n://ashaw-1/myoutputfolder/’;
XML Flattening results:
• 10 Million quotes:
Costs per run…
Cluster size: Time to execute: Approx. cost:
10 x Small nodes 64 minutes. 11 compute hours - $1.155 per hour (approx. £0.72)
19 x Small nodes 31 minutes. 20 compute hours - $2.10 per hour (approx. £1.30)
8 x Large nodes 19 minutes 8 compute hours - $3.78 per hour (approx. £2.34)
But we could have used spot instances…
• It will be a similar adoption pattern to cloud:− Those organisations that make it work and gain
additional business insights will• market more accurately• sell more• have less customer churn• have better paying customers
• Market forces will eventually force adoption or failure of their competitors – all other things being equal. It’s Darwinian evolutionary forces at work in the marketplace.
• Interestingly, the costs to exploit big data (well – at least to find out if there is some value that you are missing out on) are now very low due to vendors such as AWS, so it’s a market advantage that is relatively cheap to attain− I.e. we’re talking about a few enabled savvy staff
and some “pay as you go” compute resources
Wrapping up…