21
Tom Creighton CTO, Family Search [email protected] Digital Preservation System (DPS) Architecture

Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Tom CreightonCTO, Family [email protected]

Digital Preservation System (DPS) Architecture

Page 2: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Who We Are

• Family Search (familysearch.org)

• World’s largest genealogy organization

• In operation more than 100 years

• Non-profit

• Owned and funded by the Church of Jesus Christ of Latter-day Saints

Page 3: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

What We Do

• Promote family history research

• Provide online tools to support research

• Engage non-researchers

• Publish digitized genealogical records

• Preserve both digital and physical records

Page 4: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

F

Page 5: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Record Preservation

• Granite Mountain Records Vault– 2+ million rolls of

microfilm

– 3.6 billion images

– records from 100+ countries

Page 6: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Record Preservation

• Granite Mountain Records Vault– granite walls 200

meters thick

– climate controlled interior

Page 7: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

FamilySearch Preservation Volume – 1 Copy

Year Total Artifact Count(Millions)

Yearly Additional Storage (PB)

Cumulative Storage (PB)

2012 522 4 6

2013 1205 7 13

2014 2023 11 23

2015 2866 19 42

2016 3501 17 59

2017 4052 17 76

Page 8: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Managing Scale & Complexity• Number of artifacts – billions• Types of digital artifacts• Artifact size• Storage requirements – 100s of Petabytes• User access – 10s to 100s of thousands • User search on structured, unstructured, and

semi-structured data• Variable access rights• High availability for end-user access• Very low tolerance for artifact loss

Page 9: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Today Focus On Two Elements

• Storage architecture

• Data organization

Page 10: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

DPS 2.0 Storage Implementation

5000 CartridgesEach

5000 CartridgesEach

Image Processing – Utah DC1

Artifact Store – Utah DC2 Artifact Store – Utah DC3

Scan

Capture

QA Conversions Digital Artifact Exchange(DAX)

Digital Artifact Exchange(DAX)

Artifact IngestArtifact Ingest

Tape Storage ControllerTape Storage Controller

5000 Cartridges

Each

Page 11: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

DPS 3.0 Storage Implementation

Image Processing – Utah DC1

QA Conversions Digital Artifact Exchange(DAX)

Digital Artifact Exchange(DAX)

Artifact Store BArtifact Store A

SDB ConsumerSDB Consumer

Safety Deposit

Box

Safety Deposit

BoxInge

stIn

gest

Dissem

ination

Dissem

ination

SIP

Storage AdapterStorage Adapter

Tape Storage Service

Tape Storage Service

ScanScan

CaptureCapture

MigrateMigrate

Page 12: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

DPS 3.x Storage Vision

Image Processing – Utah DC1

QA Conversions Digital Artifact Exchange(DAX)

Digital Artifact Exchange(DAX)

ScanScan

CaptureCapture

MigrateMigrate

SDB ConsumerSDB Consumer

Safety Deposit Box

Safety Deposit BoxIn

gest

Inge

st

Dissem

ination

Dissem

ination

SIP

Glacier AdapterGlacier Adapter

SDB ConsumerSDB Consumer

Safety Deposit Box

Safety Deposit BoxIn

gest

Inge

st

Dissem

ination

Dissem

ination

Storage AdapterStorage Adapter

Tape Storage Service

Tape Storage Service

SIP

Artifact Store A Artifact Store B

Page 13: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Data Organization

• Ability to access data independent of preservation software

• Ability to access data with minimal support layers (such as HSM)

• Optimize storage cost and write speed

• Optimize for ongoing data integrity validation

• Enable efficient artifact access

Page 14: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Linear Tape File System

• Previously used custom packaging using multi-part mime.

• Could use tar or maybe afx• We now have implemented LTFS

– Specification not hard to follow– Oracle open source library and utilities great

for testing implementation.– Great for “direct” access to data.– Could use BagIt format within LTFS

Page 15: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

LTFS Implementation Project

• Oracle helped us implement LTFS by guiding us on SCSI commands to implement partitions on the T10K tapes. Then also updated their official documentation.

• Oracle’s OSS LTFS project allowed us to look at working source code which helped as well.

Page 16: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Current Specification – v2.1 http://snia.org/sites/default/files/LTFSv2.1r0DRAFT_0.pdf

Page 17: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

FamilySearch AIP Store Using LTFS

• Each Digital Genealogical Society Number (DGS) identifies a “collection” of images and associated metadata

• Each DGS contains hundreds to thousands of images

• Each DGS typically becomes an AIP.

• Each AIP has a directory with subdirectories for each artifact.

Page 18: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

A List of AIP Directories These are in LTFS volume

Page 19: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

A List of Artifact Subdirectories

Page 20: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Example of Artifact Directory

Page 21: Digital Preservation System (DPS) Architecture Tom ...digitalpreservation.gov/meetings/documents/storage... · Artifact Store – Utah DC2. Artifact Store – Utah DC3. Scan. Capture

Thank you!