BDAM:Big Data
Asset Management
Mark Harrison - Mike Sundy{mh,msundy}@pixar.com
No Recording
What is Asset Management?
• Long-Lived Data– 50 year charter
• Large Data– Many TB
• Tight Data/Metadata Integration– Shot lists, assignments, rights management
• Scalable Data Services– Human, Render Farm, Build Farm Scale
Long Lived Data
• How Templar Project was Started• Things Change
– Vendors– Software– File formats– Hardware, OS, Storage
• Your Own Requirements Change– How flexible, “hackable” can you be?
Large Data
• Expanding Expectations (include)• Harrison’s Law of 1 Terabyte (include)• Harrison’s Time Scale of Data (include)• Harrison’s law of mentioning Harrison• Basic Drivers:
– Storage: cheaper– Expectations: higher– Time: stays constant
Tight Data/Metadata Integration
• Over Time, you lose information about files• Important Information:
– Assignments, shot lists, rights clearances• Don’t let data disappear into proprietary hole
Scalable Data Services
• Picture of single server• Applications need to scale appropriately• Avoid bottleneck of single server (if possible)• Infrastructure should handle data bandwidth• Note: Bottlenecks will always move, but
always exist
Templar
• Pixar’s Proprietary Asset Management System• Handles all studio data and metadata
– feature films, shorts, special projects– artwork, scripts, movie frames, simulation data,
project management data• 50 year Timeframe
– All metadata, data can be accessed and used through 2053
Templar Asset Management
• Long-Lived Data– 50 year charter
• Large Data– Many TB
• Tight Data/Metadata Integration– Shot lists, assignments, rights management
• Scalable Data Services– Human, Render Farm, Build Farm Scale
Templar: Long Lived Data
• Federated Architecture– Loosely Coupled– Software hooks into pipeline
• Pieces can be upgraded incrementally– Software, file formats
• Exit Strategy Orientation– Standards, access to internals
Templar Large Data
• Large, Fast Storage– File system caching, etc.
• Scalable Storage Software– proprietary system for non-revisioned files– Perforce
• Both horizontal and vertical scalability
Templar Data/Metadata Integration
• “Federated” System– No monolithic application that “does everything”
• Instead, “best in class” programs that interoperate– modeling, rendering, storage, etc.
• Lightly Coupled Applications to Metadata• Metadata in Relational DB, eg Oracle• Expandable Metadata Schema
Templar: Scalable Data Services
• Multiple Access Methods for Assets– File system, HTTP, direct Perforce
• Load Balancer, multiple servers (e.g. HTTP)• File System optimizations (clusters, caching)• Perforce: use LINKATRON• Asynchronous Queuing
Perforce
• In use at Pixar since 2000 for code only• File revision history goes back to 1983• First Perforce-managed film: Toy Story 3
Perforce: Long Lived Data
• Matches “exit strategy” requirements– All data, metadata extractable, hackable– ,d magic – direct flat file storage access on back-end
• Types of Data – not just code!– art – reference and concept art – inspirational art for film– tech – show-specific data. e.g. models, textures, pipeline– studio – company-wide reference libraries. e.g. animation
reference, configuration files, Flickr-like company photo site– tools – code for our central tools team, software projects– dept – department-specific files. e.g. marketing images– exotics – patent data, casting audio, data for live action shorts,
story gags, theme park concepts, intern art show
Perforce: Large Data
• Vertical Scalability– 900 GB single file – 6.5 TB checkin– 47 TB largest single depot– 160 TB total Perforce storage across all depots
• Leverage Perforce features to reduce data:– Used +S auto-purge filetype to save 40% of
storage on Toy Story 3 (1.2 TB)– Wrote a script to de-duplicate files, using p4
checksum data. Saved 1 million files and 1 TB
Perforce: Data/Metadata Integration
• How does it integrate with Templar?– stores the files– version control– the “authority” for source writes– triggers for synchronous operations (e.g.
LINKATRON)
Perforce: Scalable Data Services
• Horizontal Scalability– 190+ depots– 58 VMWare servers– 26 million submitted changelists
• Server architecture– Scale out
• Performance on one depot won’t affect another• Easier administration/downtime scheduling
– Virtualization• 95% of physical hardware performance with greater flexibility• 15 minutes to build new server
• Automated p4 server setup (squire)– 8 seconds to run script to create new p4 instance
Conclusion
• Templar and Perforce met our four requirements:– Long-Lived Data
• 50 year charter• confidence in retrieving data due to access to internals
– Large Data• Hundreds of TB• 500 TB depot on horizon
– Tight Data/Metadata Integration• Rock solid file management• users trust it
– Scalable Data Services• 190 depots• hundreds more to come – we keep finding new uses