26
http://gibaholms.wordpress.com/ Balanc eLine4j Framework Overview Revision: 01 Gilberto Augusto Holms [email protected] @gibaholms http://gibaholms.wordpre ss.com/

BalanceLine4j Framework Overview

Embed Size (px)

DESCRIPTION

This presentation is an overview about BalanceLine4j Project, an implementation of the Balance Line Algorithm for Java applications.

Citation preview

Page 1: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

BalanceLine4j Framework Overview

Revision: 01

Gilberto Augusto [email protected]

@gibaholmshttp://gibaholms.wordpress.com/

Page 2: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

About me...

Gilberto Augusto Holms

Java and SOA Architect Expertise: Java, EAI, SOA, BPEL, BPM, Oracle Fusion Middleware Interests: OpenSource, Artificial Intelligence, Innovation Twitter: @gibaholms Blog: http://gibaholms.wordpress.com/ SCJA, SCJP, SCWCD, SCBCD, SCDJWS, OCE WLP 10g

Page 3: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm

What is “Balance Line” ?

Balance Line is an algorithm, a computational technique to coordinate the processing of sequential massive data.

Page 4: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm

What are “Sequential Data” ?

Sequential Data are big data sets, from one or more data sources, that have a common key and present themselves ordered by that key.

Page 5: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm

Why to use ?

Improves the processing performance

Saves computational resources

Page 6: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm

When to use ?

Data synchronization (like iPod)

Data loading (full or partial)

Data conciliation

Page 7: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Case Study

The “X” company have in your database a big table containing main information about all the banks and agencies of the country (number, address, contacts). Daily, this company receives from the Central Bank a file that is a huge text file containing the newest data about the agencies, where might occur the following conditions:

Data update (changes on number, address, contacts and so on)Agency not exists anymoreNew agency added

Our work is to develop a software to maintain this table up-to-date, making the file process and syncronize the record changes.

Page 8: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Dummy Solution

For each text file line

Check if the agency exists

Exists ?

Check if the agency changed data

Data changed ?

UPDATE

N Y

Y

End of file ?

INSERT

N

N

Y

For each record that not exists anymore DELETE

End

Begin

Page 9: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm Concepts

Master FileIs the main data set, represents the final view of the data, the persistent, the reference, the orign.

Transaction FileIs the secoundary data set, represents the transactions made, contais the data that must be syncronized with the orign.

KeyIs an unique identificator that identifies one single record (can be a single field, a mix of fields, a SHA-1 hash and so on).

Master

Transaction

Transaction

...

BalanceLine

BalanceLine

Page 10: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm Concepts

The big secret ...

SORTING BY KEY !

Page 11: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

1 – Identify one unique key

10 .....

5 .....

20 .....

17 .....

3 .....

10 .....

18 .....

17 .....

Master Transaction

Page 12: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

2 – Sort the data sources (ascending)

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

Page 13: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

3 – Prepare two “pointers”

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

Page 14: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

4 – Begin key comparison

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

KM > KT INSERT, moves T

Page 15: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

4 – Begin key comparison

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

KM < KT DELETE, moves M

Page 16: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

4 – Begin key comparison

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

KM = KT UPDATE, moves M and T

Page 17: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

4 – Begin key comparison

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

KM = KT UPDATE, moves M and T

Page 18: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

4 – Begin key comparison

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

KM > KT INSERT, moves T

Page 19: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

4 – Begin key comparison

5 .....

10 .....

17 .....

20 .....

3 .....

10 .....

17 .....

18 .....

Master Transaction

KM (no KT) DELETE, moves M

Page 20: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Balance Line Algorithm – Step by Step

5 – Final master file

3 .....

10 .....

17 .....

18 .....

Master

Page 21: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

BalanceLine4j Framework

Java implementation of Balance Line algorithm Focus on business rules and let the framework handle the

algorithm Provides abstraction of Sequential Data Sources that can be any

sortable data set (Comparable<T>): Object Collections, Sets, Maps Text files (with a built-in text file sorter) Database Resultsets Custom (interface provided)

Algorithm run by data streaming, little memory consumption Easy to use, easy API, no knowledge of the algorithm required Better to maintain and evolve because it promotes isolation of

business rules out of the algorithm code

Page 22: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

BalanceLine4j Framework – Additional Features

FileSorter.java

The framework provides a great file sorter class capable of safely sort big quantity of text data without memory overflow, because it uses the file system to write temporary chunks of data and then merge-sort all chunks.

Page 23: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Back to Case Study

Master File: bank agencies database table (select * order by) Transaction File: positional text file with the newest agencies

information (if not sorted, use the FileSorter class) Key: string concatenation of bank number + agency number Sync Mode: full (if the agency not exists anymore, delete it)

Benchmark: Dummy Solution vs. Balance Line Solution

Page 24: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

Back to Case Study

Dummy Solution 1 random access for each transaction record 33.218 lines x 1 query with “where” clause = 33.218

queries with “where” clause Same slow processing time in every sync

Balance Line Solution 1 single sequential access 1 query with “order by” clause Fastest processing time in first sync (70% up) and much

more faster in next syncs (less changes = less processing time because keys moves faster)

Page 25: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

BalanceLine4j Framework – Complementary Strategies

To further increase performance of the Balance Line processing algorithm, there are some complementary techniques that can be used:

Dump data from database to text, work at filesystem I/O level and then update the database (filesystem I/O is faster than networking I/O)

Sometimes using a hash code (MD5, SHA-1) to check if a record have changed is faster than compare field by field

Use a transaction code (insert, update, delete) to identify the transaction type made per record in transaction file

Buffer some records into memory to optimize the data streaming

Page 26: BalanceLine4j Framework Overview

http://gibaholms.wordpress.com/

[email protected]

Thanks !

More Information and Samples

Project Site: https://github.com/gibaholms/balanceline4j/

Authors Blog: http://gibaholms.wordpress.com/

Authors Twitter: @gibaholms