51
PK Chunking Divide and conquer massive objects in Salesforce Daniel Peter Lead Applications Engineer, Kenandy @danieljpeter Bay Area Salesforce Developer User Group

Forcelandia 2016 PK Chunking

Embed Size (px)

Citation preview

PK ChunkingDivide and conquer massive objects in Salesforce

Daniel PeterLead Applications Engineer, Kenandy

@danieljpeter

Bay Area SalesforceDeveloperUser Group

Takeaways: How to avoid these errorsQuery not “selective” enough:•Non-selective query against large object type (more than 100000 rows).

Query takes too long:•No response from the server•Time limit exceeded•Your request exceeded the time limit for processing

Toomuch data returned in query:•Toomany query rows: 50001•Remoting response size exceededmaximum of 15MB.

GET THE DATA

Sounds great. How?Not so fast…

…first we need some pre-requisite knowledge!

•Database Indexes•Salesforce Ids

Database indexes (prereq)

“Allow us to quickly locate rows withouthaving to scan every row in thedatabase”(paraphrased from wikipedia)

Database indexes (prereq)

Database indexes (prereq)LocationLocationLocation

Salesforce Ids (prereq)•Composite key containing multiple pieces ofdata.

•Uses base 62 numbering instead of the morecommon base 10.

•Fastest way to find a database row.

Salesforce Ids (prereq)

Digits Values

1 622 3,8443 238,3284 14,776,336 million5 916,132,832 million

6 56,800,235,584 billion

7 3,521,614,606,208 trillion

8 218,340,105,584,896 trillion

9 13,537,086,546,263,600 quadrillion

Digits Values

1 102 1003 1,0004 10,0005 100,000

6 1,000,000 million

7 10,000,000 million

8 100,000,000 million

9 1,000,000,000 billion

Base 10 Base 62vs0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789

Salesforce Ids (prereq)

MO’ NUMBERS

Base 62

Prerequisites complete!

How does PK Chunking work?Analogy: fetching people in a city.

Fetching people in a city: problemsNon-selective

Request: “get me all the people who are female”

Response: “yer trippin’!”

Fetching people in a city: problemsTimeout

Request: “findme a 7 foot tall person in a pink tuxedo in Beijing”

Response:(after searching all day) “I can’t find any! I give up!”

Finding people in a city: problemsToomany people found

Request: “findme all the men in San Francisco with beards”

Response:(after searching for 10 mins) “The bus is full!”

PK Chunking addresses all those problemsDivide and conquer!Parallelism!

Fetching people in a city: solutionsNon-selective

Request: “get me all the people who are female, in your small search area”

Response: “¡Conmucho gusto!”

Fetching people in a city: solutionsTimeout

Request: “findme a 7 foot tall person in a pink tuxedo in Beijing, in your smallsearch area”

Response:SP1: “Didn’t find any, sorry!”SP2: “Didn’t find any, sorry!”SP3: “Foundone!”SP4: “Didn’t find any, sorry!”

Finding people in a city: solutionsToomany people found

Request:“findme all the men in San Francisco with beards, in your small searcharea”

Response:SP1: 30 people in our busSP2: Didn’t find anySP3: 50 people in our bus

Technical details

2 different implementations

QLPKQuery Locator PK Chunking

Base62PKBase62 PK Chunking

QLPKSalesforce SOAP or REST API – AJAX toolkit works great.

Create and leverage a server-side cursor. Similar to an Apex querylocator (Batch Apex).

Analogy: Print me a phone bookof everyone in the city so I can flipthrough it.

QLPK – AJAX Toolkit Request

QLPK – AJAX Toolkit Response

Chunk the database, in size of your choice, by offsetting thequeryLocator:

01gJ000000KnRpDIAV-5000001gJ000000KnRpDIAV-100000…01gJ000000KnRpDIAV-3995000001gJ000000KnRpDIAV-40000000

QLPK – The Chunks

800 chunksx 50,000 records40,000,000 total records

Analogy: we have exact addresses for clusters of 50kpeople to give to 800 different search parties.

QLPK – How to use in a query?Perform800 queries with the Id ranges in the where clause:

SELECT Id, Autonumber__c, Some_Number__cFROM Large_Object__cWHERE Some_Number__c > 10 AND Some_Number__c < 20AND Id >= ’a00J000000BWNYk’AND Id <= ’a00J000000BWO4z’

THAT SPLIT CRAY

database so hard, take 800 queries to find me

QLPK – Parallelism

Yeah it’s 800 queries, but…

They all went out at once, and theymight all comeback at once.

Analogy: We hired 800 search parties and unleasedthem on the city at the same time.

QLPK Base62PK

Shift Gears

Base62PK

Get the first and last Id of the database andextrapolate the ranges in between.

Analogy: Give me the highest and lowest address ofeveryone in the city and I will make a phonebook withevery possible address in it. Then we will break thatinto chunks.

Base62PK – first and last IdGet the first IdSELECT Id FROM Large_Object__c ORDER BY Id ASC LIMIT 1

Get the last IdSELECT Id FROM Large_Object__c ORDER BY Id DESC LIMIT 1

Even on H-U-G-E databases these return F-A-S-T. No problem.

Base62PK – extrapolate

1. Chop off the last 9 digits of the 15 digit first/last Ids. Decompose.

2. Convert the 9 digit base 62 numbers into a Long Integer.3. Add the chunk size to the first number until you hit or

exceed the last number.4. Last chunkmay be smaller.5. Convert those Long Integers back to base 62 and re-

compose the 15 digit Ids

Base62PK – benefits

•High performance! Calculates the Ids instead ofquerying for them.

Base62PK – issues•Digits 4 and 5 of the Salesforce Id are the podIdentifier. If the Ids in your org have differentpod Id’s this technique will break, unlessenhanced.

•Fragmented Ids lead to sparsely populatedranges. You will search entire ranges of Idswhich have no records.

So which do I pick?

QLPKor

Base62PK

So which do I pick?

Hetergeneous Pod Ids Homogeneous Pod Ids

Low Id Fragmentation (<1.5x)

Medium Id Fragmentation

(1.5x - 3x)

High Id Fragmentation

(>3x)

QLPK X X X

Base62PK X X

How do I implement?

•Needs to be orchestrated via language like JS inyour page, or another platform (Heroku)•Doesn’t work on Lightning ComponentFramework (yet). No support for real parallelcontroller actions. (boxcarred)•Has to be Visualforce or Lightning / Visualforcehybrid.

How do I implement?

•Use RemoteActions to get the chunk queriesback into your page.•Can be granular or aggregate queries!•Process each chunk query appropriately when itcomes back. EX: update totals on a masterobject or push into a master array.

function queryChunks() {for (var i=0; i<chunkList.length; i++) {

queryChunk(i);}

}

function queryChunk(chunkIndex) {var chunk = chunkList[chunkIndex];

Visualforce.remoting.Manager.invokeAction('{!$RemoteAction.Base62PKext.queryChunk}',chunk.first, chunk.last,function (result, event) {

for (var i=0; i<result.length; i++) {objectAnums.push(result[i].Autonumber__c);

}

queryChunkCount++;if (queryChunkCount == chunkList.length) {

allQueryChunksComplete();}

},{escape: false, buffer: false}

);

}

@RemoteActionpublic static List<Large_Object__c> queryChunk(String firstId, String lastId) {

String SOQL = 'SELECT Id, Autonumber__c, Some_Number__c ' +'FROM Large_Object__c ' +'WHERE Some_Number__c > 10 AND Some_Number__c < 20 ' +'AND Id >= \'' + firstId + '\' ' +'AND Id <= \''+ lastId +'\' ';

return database.query(SOQL);}

LandminesTimeouts – retries•Cachewarming means if you first fail, try and try again!

Concurrency•Beware: ConcurrentPerOrgApex Limit exceeded•Keep your individual chunk queries lean. < 5 secs.

Demos

Backup video:

https://www.youtube.com/watch?v=KqHOStka0eg

How did you figure this out?Had to meet requirements for Kenandy’s largest customer. $2.5B / yrmanufacturer.

High visibility project.

Necessity mother of invention!

How did you figure this out?Query Plan Tool

How did you figure this out?Debug logs from real execution

Why doesn’t Salesforce do this?

They do! (kinda)

The Bulk API uses a similar technique, but it is moreasynchronous and wrapped in a message container totrack progress.

More Info

Article on Salesforce Developers Bloghttps://developer.salesforce.com/blogs/developer-relations/2015/11/pk-chunking-techniques-massive-orgs.html

Github repohttps://github.com/danieljpeter/pkChunking

Bulk API documentationhttps://developer.salesforce.com/docs/atlas.en-us.api_asynch.meta/api_asynch/async_api_headers_enable_pk_chunking.htm

Q&A

Thank you!