45
The one - million table partitions challenge in an ATLAS experiment DB application @ CERN Gancho Dimitrov (CERN), on behalf of the ATLAS collaboration Bulgarian Oracle User Group conference (BGOUG), 7 th -9 th June 2019, Borovets resort

The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

The one-million table partitions challenge in an ATLAS experiment DB application @ CERN

Gancho Dimitrov (CERN), on behalf of the ATLAS collaboration

Bulgarian Oracle User Group conference (BGOUG), 7th -9th June 2019, Borovets resort

Page 2: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

About Me•Member of the ATLAS experiment database group since 2006

•Acquired (some) DB related knowledge throughout the last 15+ years

•Main focus on :- data management- database schema design- database performance tuning

•Certified in Oracle RDBMS (9i, 10g, 12c)

•Certified Toastmasters International Competent Communicator

2

Page 3: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

3

CERN - European Organization for Nuclear Research founded in 1954. Situated on the French-Swiss border near Geneva

2400 staff members work at CERN as personnel 10 000 more researchers from institutes world-wide

Page 4: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

LHC

• 27km ring of superconducting magnets

• 100m beneath the France–Switzerland border

• 2 high-energy particle beams travel at close to the speed of light before they are made to collide

• 600M collisions per second

LHC (Large Hadron Collider) is the World’s Largest Particle Accelerator.

4

Page 5: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

ATLAS experiment data taking video…

5

Page 6: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Events (particle collisions) in the ATLAS detector @LHC

Particle collisions are called Events

Datasets

Files

Events

6

25m

Muon Spectrometer

Calorimeter MagnetSystemInner Detector

44m

7000t

Events’ data are stored into files outside the DB (435 PB)

Page 7: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

The Event WhiteBoard (EWB) project @ Oracle 18.3

•EWB concept: logically groups particle collision Events (aka “collections”)

•EWB collections: consumed by a system which process them in Event ranges•The smallest event range for processing: a single Event•Collection’s flexible metadata: JSON block in VARCHAR2 (upto 32K)

•EWB collection removal: once processing of a given collection is finished•Lifetime of an EWB collection: from week(s) to month(s)•Daily rate of EWB collection creation: tens to hundreds or thousands

•Rows per EWB collection: thousands to millions

7

Page 8: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Challenge in the DB schema design

Common engineering solution (might not scale in several years)

versus

Over-engineered system (should scale, but likely to be unnecessarily complex)

8

Page 9: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

An Event Whiteboard (visual representation)

9

Page 10: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Idea for the EWB data physical organization

•Data of each EWB collection: self-contained in a table partition

•Table indices: partitioned in the same fashion as the table

•EWB ”sponge” process: removes EWB collection(s) data by dropping partition(s)

10

Page 11: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Quiz

What is the maximum allowed number of partitions (or sub-partitions) in a single table in Oracle RDBMS?

1) 100 thousand

2) 500 thousand

3) 1 million

4) More

11

Page 12: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Quiz answerThe allowed maximum number of partitions in a single table in Oracle is

1048575(1024 * 1024) – 1

12

The max number of EWB collections at any given time is limited to

1048575

Page 13: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

The ATLAS DBAs experience (and how far can we go?)•The ATLAS EventIndex is a catalogue containing information about the basic

properties of the Events as well as references to its storage location.

13

DATASETS

PARENT TABLE

DATASET_IDPROJECTRUNNUMBERSTREAMNAMEPRODSTEPDATATYPEAMITAG

EVENTS

CHILD TABLE

DATASET_IDEVENTNUMBERLUMIBLOCKNBUNCHIDGUID0GUID1GUID2

Unique Key

Primary Key

LIST PARTITIONED BY DATASET_ID

CREATE TABLE child_table (DATASET_ID NUMBER(10,0),...CONSTRAINT cons_name PRIMARY KEY (..) using index COMPRESS 1LOCAL) pctfree 0 COMPRESS BASIC tablespace &&m_tbsPARTITION BY LIST(DATASET_ID)( PARTITION DATASET_ZERO VALUES(0) );

Page 14: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

The ATLAS DBAs experience (cont.)•The EventIndex system is in production since March 2016

•As of May 2019, the EventIndex hosts 177 billion rows (event records unevenly distributed in about 73300 list-type partitions (rate of about 25K new partitions per year)

•Note: no concurrency in data load, compression on (basic deduplication)

14

TABLE PARTITIONS3.3 TB

INDEX PARTITIONS 2.9 TB

Page 15: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

EWB simplified tables layout (partial)

15

CO

LLE

CTI

ON

S

COLLECTION ID

COLLECTION NAMECOLLECTION TYPE

COLLECTION STATUS

CREATION TIME

MODIFICATION TIME

…… more columns …

COLLECTION METADATA

PRIMARY KEYUNIQUE KEY

JSON block in VARCHAR2(32K)

CO

LLE

CTI

ON

_OB

JEC

TS COLLECTION ID

STORAGE SERVER ID

EVENT RANGE MIN ID

EVENT RANGE MAX ID

EVENT RANGE STATUS

... more columns …

EVENT RANGE METADATA

PRIMARY KEY

JSON block

Page 16: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

How to partition the EWB “collection_objects” child table?

16

Page 17: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Design approach “1” Range partition on COLL_ID

INTERVAL(value)

Idea: introduce automatic RANGE + INTERVAL partitioning based on the COLL_ID’s incremental values.

The INTERVAL value can be changed/adjusted over the time.

17

Page 18: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Range partitioned table with INTERVAL(10)

18

CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY RANGE(COLL_ID) INTERVAL(10)( PARTITION COLLOBJ_ZERO VALUES(0) );

If necessary, the INTERVAL(value) can be changed over the time:

ALTER TABLE COLLECTION_OBJECTS SET INTERVAL(50);Table altered.

Page 19: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Range Interval automatic partitions creation

19

INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E3); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E4); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E5); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E6); INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (1E7);

Created six relevant table partitions:

table_name part_position partition_name high_value

COLLECTION_OBJECTS 1 COLLOBJ_ZERO 0COLLECTION_OBJECTS 2 SYS_P560581 1010COLLECTION_OBJECTS 3 SYS_P560587 10010COLLECTION_OBJECTS 4 SYS_P560591 100010COLLECTION_OBJECTS 5 SYS_P560610 1000010COLLECTION_OBJECTS 6 SYS_P560663 10000010

Page 20: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Range Interval automatic partitions creation (cont.)

20

INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (10485740);ORA-14300: partitioning key maps to a partition outside maximum permitted number of partitionsWhy? The table currently has only 6 partitions(!)

Oracle allows max 1048575 partitions per table.With INTERVAL(10) => the MAX allowed COLL_ID should be 10485750?

INSERT INTO COLLECTION_OBJECTS (coll_id) VALUES (10485739);1 row created.

table_name part_position partition_name high_value ...COLLECTION_OBJECTS 6 SYS_P560663 10000010COLLECTION_OBJECTS 7 SYS_P561534 10485740

Page 21: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

“Range Interval” partitioned table •There is a cap of 1048575 partitions, even if these partitions are not yet in existence. Oracle “pre-defines” all potential partitions.

•Sooner or later (depending on the COLL_ID value and the INTERVAL value) we will get ORA-14300 error no matter that fraction of the existing partitions will be removed over the time.

•Partition removal using the PARTITION FOR(…) is particular! ALTER TABLE COLLECTION_OBJECTS DROP partition FOR (0);removed partition with high value 1010 ( 0 is its lower boundary)

21

Page 22: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Design approach “2” List partition per COLL_ID single value

Idea: each table partition contains data of a single EWB collection. Removal of any EWB collection data would be straightforward.

22

Page 23: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

List-type partition for each data collection

AUTOMATIC option is avoided because partitioned JSON SEARCH INDEX could not be created (Oracle 18c, 19c)CREATE SEARCH INDEX idx_name ON COLLECTION_OBJECTS(OBJ_METADATA) FOR JSON LOCAL;

ORA-29886: feature not supported for domain indexes

23

CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID)( PARTITION COLLOBJ_ZERO VALUES(0) );

Page 24: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Automation in List-type partitions creation •A dedicated List partition per Collection is created by an AFTER INSERT

trigger on parent table which calls an in-house created PLSQL proc.

•Partition pruning is straightforward: SELECT * FROM COLLECTION_OBJECTS partition FOR (5276);

•Partition removal is easy:ALTER TABLE COLLECTION_OBJECTS DROP partition FOR (5276);

•All worked well, but does not seem scalable because of the 1048575 partitions limit per table.

24

Page 25: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Design approach “3” List partition per sequence of COLL_ID values

Idea: Each List-type table partition to host sequence of data collections (e.g. 10, 20, 50 or more collections per partition).

Pros: can accommodate much more collections over time

Cons: all collections of a given partition must be set “not needed” (status = ‘NN’) before a partition removal action

25

Page 26: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

List-type partition for sequence of data collections

LIST AUTOMATIC option is not possible in this approach.

26

CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID)( PARTITION COLLOBJ_ZERO VALUES(0) );

Page 27: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Automation in List-type partitions creation (COLL_ID set)•A dedicated List partition per collection set is created by a BEFORE INSERT

trigger on the parent table which calls an in-house created PLSQL proc.

27

Page 28: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

PLSQL code and created partitions

28

Page 29: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Interesting findings: •Achieved flexibility as the number of sequential collections per partition can be

changed just by changing a single value in the ”before insert” trigger:

Sequence of 10 list values: created 88485 partitionsSequence of 5 list values: created 32745 partitions

•After creation of 121230 partitions:

“Error "ORA-14309: Total count of list values exceeds maximum allowed"

• What is the maximum number of list values allowed by the DB (Oracle 18.3)?

Count on the existing list partition key values showed :

1048575= (1024*1024)-1

29

Page 30: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

ORA-14309 and ORA-14299•In that case

"ORA-14309: Total count of list values exceeds maximum allowed"

and

“ORA-14299: total number of partitions exceeds the maximum limit”

are similar as Oracle imposes the same limit of 1048575.

•Outcome: ORA-14309 is a showstopper for ”Approach 3”

30

Page 31: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Design approach “4” List partition on virtual column based on

COLL_ID

Idea: List partition on virtual column MOD(COLL_ID, nnn).It guarantees maximum “nnn” partitions on the child table (note: ”nnn” must be smaller than 1 million)

Avoids the number of partitions limit (ORA-14299) and the number of list-key values limit (ORA-14309).

31

Page 32: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

List-type partition on virtual column

MOD function returns the remainder of COLL_ID divided by 500000.

The table will have max 500K partitions

The choice of the division value is crucial, it cannot be changed over time (ORA-14060). 32

CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),COLL_ID_VIRT_GROUP NUMBER(10,0) GENERATED ALWAYS AS (MOD(COLL_ID,500000)) VIRTUAL,...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID_VIRT_GROUP)( PARTITION COLLOBJ_ZERO VALUES(0) );

Page 33: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

”Trigger & Proc” for List-type partitions creation •A dedicated List partition per Collection is created by an AFTER INSERT

trigger on parent table which calls an in-house created PLSQL proc.

33

Note: 1) both, the parent and the child tables have the same virtual column definitionCOLL_ID_VIRT_GROUP NUMBER(10,0) GENERATED ALWAYS AS (MOD(COLL_ID,500000)) VIRTUAL

2) The PLSQL proc handles “ORA-14312: Value x already exists in partition y”

Page 34: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Considerations with List partitions on virtual column•Test: 266650 partitions were created

•For PK or UQ indexes to be equally partitioned as the table, the virtual column must be part of them.

•Partition pruning to kick in, an extra condition in the queries WHERE clause has to be added:

SELECT * FROM ... WHERE COLL_ID_VIRT_GROUP =MOD(:coll_id_value, 500000) AND COLL_ID = :coll_id_value AND STATUS = :status_val ;

34

Page 35: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Design approach “5” List automatic partition on virtual column based

on COLL_ID

Idea: the same as “Approach 4” but relies on the Oracle List AUTOMATIC mechanism instead of “Trigger & Procedure” machinery.Note: the requirement for JSON search index was withdrawn.

35

Page 36: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

List-type automatic table partitioning on virtual column

MOD function returns the remainder of COLL_ID divided by 500000.

The table will have max 500K partitions

36

CREATE TABLE COLLECTION_OBJECTS(COLL_ID NUMBER(10,0),COLL_ID_VIRT_GROUP NUMBER(10,0) GENERATED ALWAYS AS (MOD(COLL_ID,500000)) VIRTUAL,...CONSTRAINT COLLOBJ_PK PRIMARY KEY (...) using index LOCAL)PARTITION BY LIST(COLL_ID_VIRT_GROUP)AUTOMATIC( PARTITION COLLOBJ_ZERO VALUES(0) );

Page 37: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Data insertion into “List automatic” partitioned table

•In case of many rows insertion into a “List automatic” partitioned table:

Oracle sorts the rows beforehand based on the partitioning key and wait events “PGA memory operation” could be seen.

Such insert operations are slower than insertion into a standard List partitioned table.

37

Page 38: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

“List automatic“ partitions on virtual column•Test: 500000 partitions were automatically created using “INSERT INTO collection_objects …” statement.

•It took about a week time.Over the time, a partition creation was becoming slower and slower(certainly DDL actions are serialized).

38

Upto 30K partitions: rate of 50-60 partitions/secondAfter 70K partitions: rate of 3-4 partitions/secondAfter 80K partitions: rate of 3 partitions/secondAfter 160K partitions: rate of 1-2 partitions/secondAfter 180K partitions: rate of 1 partition/secondAfter 200K partitions: rate of 1 partition/secondWithin 200K-400K partitions: rate of 1 partition per 1-2 secondsWithin 400K-500K partitions: rate of 1 partition per 2 seconds

Page 39: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

“List automatic“ partitions on virtual column (cont.)•Without concurrent read or write operations on the table:

Wait events "PGA memory operation” and ”Local write wait” (0 ms) on partition creation.

• With concurrent read (even the Oracle auto stats gathering job) or write operations on the table:

Wait events “Library cache lock” and “Library cache load lock” (20ms-500ms depending on the number of existing table partitions)on partition creation.

39

Page 40: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Considerations with List automatic part. on virtual column•For PK or UQ indices to be equally partitioned as the table, the virtual column must be part of them.

•Partition pruning to kick in, an extra condition in the queries WHERE clause has to be added:

SELECT * FROM ... WHERE COLL_ID_VIRT_GROUP =MOD(:coll_id_value, 500000) AND COLL_ID = :coll_id_value AND STATUS = :status_val ;

40

Page 41: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

EWB “sponge” on “List automatic” virtual column part.•How safely remove partitions in DML concurrent environment?

ALTER TABLE … MODIFY PARTITION FOR(499077) READ ONLY;Table altered.

ALTER TABLE … DROP PARTITION FOR(499077);ORA-14466: Data in a read-only partition or subpartition cannot be modified.

41

Page 42: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

EWB “sponge” process in action 1. Check all collections of a given virtual group have certain status

SELECT COUNT(coll_id) - SUM(DECODE(status, ‘NOT NEEDED', 1, 0)) FROM collections WHERE coll_id_virt_group = 499077;

2. If the query result = 0, then:LOCK TABLE COLLECTION_OBJECTS PARTITION FOR(499077) IN SHARE MODE;

3. Repeat (1) to ensure that concurrent of step 2 transaction did not happen meanwhile.

4. If the query result = 0, then: ALTER TABLE COLLECTION_OBJECTS DROP PARTITION FOR(499077);

42

Page 43: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Which is the best approach out of the explored five?

Approach 1: “Range Interval”

versus

Approach 5: “List Automatic on virtual column”

43

More studies are necessary

Page 44: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Take away message

44

Work with Oracle table partitions is mixture offunand

challenges

---Obviously

NO PAIN, NO GAIN !

Page 45: The one-million table partitions challenge in an ATLAS experiment … · 2019. 6. 18. · About Me •Member of the ATLAS experiment database group since 2006 •Acquired (some) DB

Thank you for your interest!

[email protected]

Twitter: @GanchoDim