59
APACHE SLING & FRIENDS TECH MEETUP BERLIN, 25-27 SEPTEMBER 2017 Binary Data Management Features in Oak 1.8 Matt Ryan, Adobe / Conrad Wöltge, Netcentric

Binary Data Management Features in Oak 1 · APACHE SLING & FRIENDS TECH MEETUP BERLIN, 25-27 SEPTEMBER 2017 Binary Data Management Features in Oak 1.8 Matt Ryan, Adobe / Conrad Wöltge,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

APACHE SLING & FRIENDS TECH MEETUPBERLIN, 25-27 SEPTEMBER 2017

Binary Data Management Features in Oak 1.8Matt Ryan, Adobe / Conrad Wöltge, Netcentric

Who are we?

2

▪ Matt Ryan▪ Senior Computer Scientist,

Adobe▪ Work on Apache Oak▪ Focus on data store

Who are we?

3

▪ Conrad Wöltge▪ CTO with Netcentric▪ 10 years with Sling▪ Profile with high

performance, large scale environments

Disclaimer

4

▪ We are representing Oak today, not Adobe▪ Presentation content includes things the Oak

team is exploring for a future version of Oak▪ Presentation content should not be interpreted

as a promise or expectation of any future capability or feature of AEM

Agenda

5

▪ CompositeDataStore▪ Other Binary Data Management Challenges

in Apache Oak▪ Unmanaged Binaries▪ Direct Read-Only Access

▪ Q&A

6

Oak CompositeDataStore

Motivations for CompositeDataStore

7

▪ Have you ever wished you could:▪ Have read-only access to an Oak data store?▪ Automatically archive infrequently-used data?▪ Use criteria other than file size to determine

where to store data?▪ Store different types of data in different data

stores?

What is a “CompositeDataStore?”

8

Oak

CompositeDataStore

Delegate Data Store #2

Delegate Data Store #1

Use Case – Testing Environment

9

▪ User may wish to mimic production environment in a testing environment▪ Duplicating the production data store may be

cost and time-prohibitive▪ What if Oak could access the production data

store read-only?

Use Case – Testing Environment

10

Production Environment

Oak Node Store

Data Store

Use Case – Testing Environment

11

Production Environment Testing Environment

Oak Oak Node Store

Data Store

Snapshot Replica

Node Store

Data Store

Use Case – Testing Environment

12

Production Environment Testing Environment

Oak OakNode Store

Node Store

Snapshot Replica∅Data Store

Data Store

Use Case – Testing Environment

13

Production Environment Testing Environment

Oak OakNode Store

Data Store

Node Store

Data Store

Use Case – Automatic Archiving

14

▪ Many cloud-storage systems offer different storage classes, or tiers▪ E.g. AWS S3, S3 IA, Glacier

▪ What if Oak could represent data in multiple storage classes in a single logical data store?

Use Case – Automatic Archiving

15

Oak

CompositeDataStore

S3

S3 IA

Glacier

90 Days

180 Days

Use Case – Automatic Archiving

16

Oak

CompositeDataStore

1

2

3

S3

S3 IA

Glacier

Use Case – Designate Highly-Available Data

17

▪ Some types of storage may offer more highly-available storage, at a higher expense▪ Putting the entire repo in HA storage costs more▪ Putting the entire repo in standard storage

means none of it is as highly available

▪ What if Oak could select where to store data based on some criteria tied to the data?

Use Case – Designate Highly-Available Data

18

Oak

Azure Blob Store - GRS

Azure Blob Store - LRS

Use Case – Designate Highly-Available Data

19

Oak

CompositeDataStore

Azure Blob Store – LRS

Azure Blob Store – GRS

Use Case – Designate Highly-Available Data

20

Oak

CompositeDataStore

Azure Blob Store – LRS

Azure Blob Store – GRS

Use Case – Designate Highly-Available Data

21

Oak

CompositeDataStore

Azure Blob Store – LRS

Azure Blob Store – GRS

Use Case – Designate Highly-Available Data

22

Oak

CompositeDataStore

Azure Blob Store – LRS

Azure Blob Store – GRS

Use Case – Designate Highly-Available Data

23

Oak

CompositeDataStore

Azure Blob Store – LRS

Azure Blob Store – GRS

Use Case – Designate Highly-Available Data

24

Oak

CompositeDataStore

Azure Blob Store – LRS

Azure Blob Store – GRS

Use Case – Designate Highly-Available Data

25

Oak

CompositeDataStore

Azure Blob Store – GRS

Azure Blob Store – LRS

Use Case – Store Binaries by Type

26

▪ An Oak user may wish to make use of cloud storage certain types of files, but keep other files local

▪ What if Oak could choose where to put a binary based on the node type or path?

Use Case – Store Binaries by Type

27

Oak

CompositeDataStore

S3DataStore

FileDataStore

Status

28

▪ Demoable POC –Testing Environment▪ Code is available at:

https://github.com/mattvryan/jackrabbit-oak/tree/composite-data-store

29

Demo

30

Additional Binary Data Management Challenges in Apache Oak

Challenges

31

▪ How do we quickly add a large content repository to Oak?

▪ How do we responsibly allow direct access to a binary?

Challenges

32

▪ How do we quickly add a large content repository to Oak?

▪ How do we responsibly allow direct access to a binary?

Adding Existing Repositories to Oak

33

Existing S3 Repository

User has existing 1TB content repository

Adding Existing Repositories to Oak

34

Existing S3 Repository

New Oak Instance

User wants to use Oak to manage the content repository…

Adding Existing Repositories to Oak

35

Existing S3 Repository

New S3 Repository??

New Oak Instance

User must import entire existing repository into a new repository

Adding Existing Repositories = Expensive

36

Challenges

37

▪ How do we quickly add a large content repository to Oak?

▪ How do we responsibly allow direct access to a binary?

Direct Binary Access

38

Oak Repo

Oak Instance

Native Application

Direct Binary Access

39

Oak Repo

Oak Instance

Native Application

Challenges with Direct Access

40

▪ Oak maps nodes to blobs▪ Oak controls access rights▪ Oak can’t support data that is accessed

directly

Direct Binary Access

41

Oak Repo

Oak Instance

Native Application

Direct Binary Access

42

Oak Repo

Oak Instance

Native Application

Direct Binary Access

43

Oak S3 Repo

Oak Instance

AWS Lambda

Processing Large Binary Files = Expensive

44

45

Unmanaged Binary Support

Unmanaged Binary

46

▪ Oak does not “manage” the binary▪ No deduplication▪ No garbage collection▪ No indexing▪ No guarantees on access permission, availability

or existence

Unmanaged Binary

47

▪ Oak knows about the binary▪ Oak stores a reference to the binary▪ Oak can fulfill requests for the binary▪ Oak can return a URI for direct access to the

binary

Adding Existing Repositories to Oak

48

New Oak S3 Repository

New Oak Instance

Oak can reference existing (unmanaged) repo immediately – new (managed) binaries in same bucket

Existing S3 Repository

Direct Binary Access

49

Oak Repo

Oak Instance

Native Application

Direct Binary Access

50

Oak Repo

Oak Instance

Native Application

Possible Usage (not final)

51

Resource resource =

resourceResolver.getResource(

“/path/to/unmanaged_binary.jpg”

);

URI unmanagedBinaryUri =

resource.adaptTo(URI.class);

Status

52

▪ We are hoping to start working on this soon▪ Does anyone want to help? Let’s talk

53

Direct Read-Only Access via Signed S3 URI

What About Read-Only Access?

54

Oak

S3 Data Store

Client

Read-Only Access Disallowed Because:• Only Oak knows the

real binary name• Only Oak knows

whether the client has rights to access the binary

Oak Signed URI

55

▪ Data store can deliver a signed URI on request if:▪ Backend supports it (e.g. S3, Azure)▪ User has rights to the object

▪ Short TTL▪ Issued as redirect from Sling

Oak Read-Only Access

56

Oak

S3 Data Store

Client

1. Client requests binary from Oak

2. Oak creates signed URI and returns as redirect

3. Client requests binary directly from S3

57

Q&A

Contact Info

58

Matt:▪ @mattvryan▪ linkedin.com/in/mvryan/

Conrad:▪ @mahatmac▪ linkedin.com/in/conradw

oeltge/

59