22
In memory OLAP engine Samuel Pelletier Kaviju inc. [email protected]

In memory OLAP engine

Embed Size (px)

Citation preview

Page 1: In memory OLAP engine

In memory OLAP engineSamuel Pelletier Kaviju inc. [email protected]

Page 2: In memory OLAP engine

OLAP ?

• An acronym for OnLine Analytical Processing.

• In simple words, a system to query a multidimensional data set and get answer fast for interactive reports.

• A well known implementation is an Excel Pivot Table.

Page 3: In memory OLAP engine

Why build something new• I wanted something fast, memory efficient for simple queries with

millions of facts.

• Sql queries dost not works for millions of facts with multiple dimensions, especially with large number of rows.

• There are specialized tools for OLAP from Microsoft, Oracle and others but they are large and expensive, too much for my needs.

• Generic cheap toolkits are not memory efficient, this is the cost for their simplicity.

• I wanted a simple solution to deploy with minimal dependency.

Page 4: In memory OLAP engine

Memory usage and time to retrieve 1 000 000 invoice lines

• Fetching EOs uses 1.2 GB of ram in 13-19 s

• Fetching raw rows uses 750 MB of ram in 5-8 s.

• Fetching as POJOs with jdbc uses 130 MB in 4.0 s.

• Reading from file as POJOs uses 130 MB in 1.4 s.

• For 7 M rows, EOs would require 8.4 GB for gazillions of small objects (bad for the GC).

Page 5: In memory OLAP engine

Time to compute sum of sales for 1 000 000 invoice lines

• 2.1 s for "select sum(sales)..." in FrontBase with table in RAM.

• 0.5 s for @sum.sales on EOs.

• 0.12 s for @sum.sales on raw rows.

• 0.5 s for @sum.sales on POJOs.

• 0.009 s for a loop with direct attribute access on POJOs.

Page 6: In memory OLAP engine

Some concepts

• Facts are the elements being analyzed. An exemple is invoice lines.

• Facts contains measures like quantities, prices or amounts.

• Facts are linked to dimensions used to filter and aggregate them. For invoice lines, we have product, invoice, date, etc.

• Dimensions are often part of a hierarchy, for example, products are in a product category, dates are in a month and in a week.

Page 7: In memory OLAP engine

Sample Invoice dimension hierarchy

InvoiceLine

Invoice

DateMonth

Ship to Client type

Sold to

Product

Salesman

SalesManager

Week

Client type

Measures:Shipped QtySalesProfits

Page 8: In memory OLAP engine

Steps to implement an engine

• Create the Engine class.

• Create required classes to model the dimension hierarchy.

• Create the Value class for your facts.

• Create the Group class that will compute summarized results.

• Create the dimensions definition classes.

Page 9: In memory OLAP engine

Engine class

• Engine class extends OlapEngine with Group and Value types. public class SalesEngine extends OlapEngine<GroupEntry,Value>

• Create the objects required for the data model and lookup table used to load the facts.

• Load the fact into Value objects.

• Create and register the dimensions.

Page 10: In memory OLAP engine

Create required model objects

public class Product { public final int code; public final String name; public final ProductCategory category; public Product(int code, String name, ProductCategory category) { super(); this.code = code; this.name = name; this.category = category; }}! private void loadProducts() { productsByCode = new HashMap<Integer, Product>();! WOResourceManager resourceManager = ERXApplication.application().resourceManager(); String fileName = "olapData/products.txt"; try ( InputStream fileData = resourceManager.inputStreamForResourceNamed(fileName, null, null);) { InputStreamReader fileReader = new InputStreamReader(fileData, "utf-8"); BufferedReader reader = new BufferedReader(fileReader); String line; while ( (line = reader.readLine()) != null) { String[] cols = line.split("\t", -1); Product product = new Product(Integer.parseInt(cols[0]), cols[0], categoryWithID(cols[1])); productsByCode.put(product.code, product); } } ... }

Page 11: In memory OLAP engine

Load the facts and create dimensions

private void loadInvoiceLines() { ... loadProductCategories(); loadProducts();! InvoiceDimension invoiceDim = new InvoiceDimension(this); SalesmanDimension salesmanDim = new SalesmanDimension(this); while ( (line = reader.readLine()) != null) { String[] cols = line.split("\t", -1);! InvoiceLine invoiceLine = new InvoiceLine(valueIndex++, Short.parseShort(cols[1])); invoiceLine.shippedQty = Integer.parseInt(cols[6]); invoiceLine.sales = Float.parseFloat(cols[7]); invoiceLine.profits = Float.parseFloat(cols[8]); lines.add(invoiceLine); invoiceDim.addLine(invoiceLine, cols[0], cols);! invoiceLine.salesmanNumber = Integer.parseInt(cols[12]); salesmanDim.addIndexEntry(invoiceLine.salesmanNumber, invoiceLine); ... } } addDimension(productDimension); addDimension(productDimension.createProductCategoryDimension()); ... lines.trimToSize(); setValues(lines); }

Page 12: In memory OLAP engine

Value and GroupEntry classes

• Value classe contains your basic facts (invoice lines for example) public class InvoiceLine extends OlapValue<Sales>

• GroupEntry is use to compute summarized results. public class Sales extends GroupEntry<InvoiceLine>

• These are tightly coupled, a GroupEntry represent a computed result for an array of Values; metrics are found in both classes.

Page 13: In memory OLAP engine

Value Class

public class InvoiceLine extends OlapValue<Sales> { public Invoice invoice; public final short lineNumber; public Product product;! public int shippedQty; public float sales; public float profits;! public int salesmanNumber; public int salesManagerNumber;! public InvoiceLine(int valueIndex, short lineNumber) { super(valueIndex); this.lineNumber = lineNumber; }}

Page 14: In memory OLAP engine

GroupEntry class

public class Sales extends GroupEntry<InvoiceLine> { private int shippedQty; private double sales = 0.0; private double profits = 0.0; ! public Sales(GroupEntryKey<Sales, InvoiceLine> key) { super(key); }! @Override public void addEntry(InvoiceLine entry) { shippedQty += entry.shippedQty; sales += entry.sales; profits += entry.profits; }! @Override public void optimizeMemoryUsage() { } return sales; }! ...}

Page 15: In memory OLAP engine

Dimensions classes

• Dimensions implement the engine indexes and key extraction for result aggregation.

• Dimensions are usually linked to another class representing an entity like Invoice, Client, Product or ProductCatogory.

• Entity are value object POJO for optimal speed an memory usage. You may add a method to get the corresponding eo.

• Dimensions are either leaf (a group of facts) or group (a group of leaf entries).

Page 16: In memory OLAP engine

Product dimension class

public class ProductDimension extends OlapLeafDimension<Sales,Integer,InvoiceLine> {! public ProductDimension(OlapEngine<Sales, InvoiceLine> engine) { super(engine, "productCode"); }! @Override public Integer getKeyForEntry(InvoiceLine entry) { return entry.product.code; }! @Override public Integer getKeyForString(String keyString) { return Integer.valueOf(keyString); } public ProductCategoryDimension createProductCategoryDimension() { long startTime = System.currentTimeMillis(); ProductCategoryDimension dimension = new ProductCategoryDimension(engine, this);! for (Product product : salesEngine().products()) { dimension.addIndexEntry(product.category.categoryID, product.code); } long fetchTime = System.currentTimeMillis() - startTime; engine.logMessage("createProductCategoryDimension completed in "+fetchTime+"ms."); return dimension; }! private SalesEngine salesEngine() { return (SalesEngine) engine; }

Page 17: In memory OLAP engine

Product category dimension class

public class ProductCategoryDimension extends OlapGroupDimension<Sales,Integer,InvoiceLine,ProductDimension,Integer> {! public ProductCategoryDimension(OlapEngine<Sales, InvoiceLine> engine, ProductDimension childDimension) { super(engine, "productCategoryCode", childDimension); }! @Override public Integer getKeyForEntry(InvoiceLine entry) { return entry.product.category.categoryID; }! @Override public Integer getKeyForString(String keyString) { return Integer.valueOf(keyString); }

Page 18: In memory OLAP engine

Initialize and use in an app

• The engine is multithread capable once loaded.

• I usually create a singleton for the engine; it can also be in your app class.

• Entity are value object POJO for optimal speed an memory usage. You may add a method to get the corresponding eo.

• Dimensions are either leaf (a group of facts) or group (a group of leaf entries).

Page 19: In memory OLAP engine

Use in a application

public Application() { ... SalesEngine.createEngine(); }!!In the component that uses the engine! public OlapNavigator(WOContext context) { super(context); .... engine = SalesEngine.sharedEngine(); if (engine == null) { Engine me bay null if it has not completed it's loading... } }! someFetchMethod() { OlapResult<Sales, InvoiceLine> result = engine.resultForRequest(query);! rows = new NSArray<Sales>(result.getGroups()); sort or put inside a ERXDisplayGroup... }!

Page 20: In memory OLAP engine

Demo app

Page 21: In memory OLAP engine

Java and memory

• To keep the garbage collector happy, it is better to have a maximum heap at least 2-3 times the real usage.

• GC can kill your app performance if memory is starved. With default setting, it may even kill your server by using multiple core for long periods at least in 1.5 and 1.6.

• Java 1.7 contains a new collector, probable better.

Page 22: In memory OLAP engine

Q&ASamuel Pelletier [email protected]