The POOL and 3D Projects The LCG POOL and 3D Projects Dirk Duellmann CERN IT-DB Fermilab Computing Seminar, 21. October 2004

Embed Size (px)

Text of The POOL and 3D Projects The LCG POOL and 3D Projects Dirk Duellmann CERN IT-DB Fermilab Computing...

  • The LCG POOL and 3D ProjectsDirk DuellmannCERN IT-DB

    Fermilab Computing Seminar, 21. October 2004

    D.Duellmann, CERN

  • OutlineWhat is POOL? POOL architecture and implementation choicesIntegration into experiment s/w frameworksExperience in Data ChallengesNew developments this yearDistributed Database DeploymentProject scopeConstraints from application and deployment sideFirst ideas for a service architecture

    D.Duellmann, CERN

  • What is POOL? A common Persistency Framework for LHC physics applicationsPool Of persistent Objects for LHC

    Common effort from LHC experiments and CERN database / software groups for defining its architecture and features for the development of its components

    Part of the LHC Computing Grid (LCG)One of the first Application Area Projects (started in June 2002)

    About 25 public releases, 120 internal releasesfirst public: December 2002 (0.3.0)first fully functional release: May 2003 (1.0.0)production releases: March June 2004 (1.6 series)

    D.Duellmann, CERN

  • POOL Objectives To allow the multi-PB of experiment data and associated meta data to be stored in a distributed and Grid enabled fashion various types of data of different volumes

    Hybrid technology approach, combining C++ object streaming technology, [ ROOT I/O ] for the bulk data Transactionally safe Relational Database (RDBMS) services, such as MySQL or Oracle, for catalogs, collections and metadata

    In particular, it provides Persistency for C++ objects Transparent navigation across file and technology boundariesIntegrated with an external File Catalog to keep track of the file physical location, allowing files to be moved or replicated

    D.Duellmann, CERN

  • LCG Component Architecture POOL is a component based system following the LCG Architecture Blueprint

    POOL provides a technology neutral API Abstract component C++ interfaces Insulates the experiment framework user code from several concrete implementation details and technologies used today POOL user code is not dependent on implementation libraries No link time dependency on implementation packages (e.g. MySQL, Oracle, Root, Xerces-C..) Backend component implementations are loaded at runtime via the LCG- SEAL plug-in infrastructure

    Three major domains, weakly coupled, interacting via abstract interfaces

    D.Duellmann, CERN

  • POOL Component BreakdownPOOL is (mainly) a client-side packageCoupling to standard file, database and grid services. No specialized POOL servers!Storage Manager Streams transient C++ objects to/from disk Resolves a logical object reference to a physical object I/O via ROOT (rfio/dcache) or RDBMS (Oracle/MySQL/SQLite)File Catalog Maintains consistent lists of accessible files together with their unique identifiers (FileID), Used to resolves the logical file reference (from a POOL Object ID) to a physical file Collections Implements (large) containers for persistent objects (eg event collections) stored via POOL

    D.Duellmann, CERN

  • POOL on the Grid

    D.Duellmann, CERN

  • POOL off the Grid

    D.Duellmann, CERN

  • Navigational Access via TokensUnique Object Identifier (OID) per object (also called a POOL Token)Direct access to any object in the distributed storeNatural extension of the pointer conceptTokens allow to implement networks of persistent objects (associations)Association Cardinality: 1:1 , 1:n, n:m POOL tokens are not directly exposed to the end userPOOL provides templated smart-pointer pool:ref Smart pointer type which calls back to POOL to open files / database connections and loads objects into memoryPhysical information like file or hostnames are not exposedFileIDContainerIDObjectNo

    D.Duellmann, CERN

  • Physical and Logical Data Model

    Important to clearly separate between the two models Avoids polluting user code with unnecessary detail about file and host names Leaving the physical storage open to optimise for performance or changing access patterns without affecting applications

    D.Duellmann, CERN

  • Storage Components Data Cache StorageSvcDisk StorageObject PointerPersistent Address

    D.Duellmann, CERN

  • POOL Cache Access

    TokenStorage TechnologyObject TypePersistent Location

    KeyObjectToken2

    D.Duellmann, CERN

  • Dictionary: Population/ConversionDictionary Generation

    D.Duellmann, CERN

  • DevelopmentsThis YearMove to ROOT4 (POOL2.0 Line)To take advantage of automatic schema evolution and simplified streaming of STL containersNeed to insure backward compatibility for POOL 1.x filesCurrently undergoing validation by the experimentsWill release two branches until POOL 2 is fully certifiedFile Catalog deployment issuesDC productions showed some weaknesses of grid catalog implementationsSeveral new/enhanced catalogs coming upChanges in the experiment computing models need to be taken into accountPOOL tries to generalise from specific implementations and provides an open interface to accommodate upcoming componentsCollectionsSeveral implementations of POOL collections existCollection cataloguing has been added in response to experiment requestsSimilar to file catalogsre-use of catalog implementation and commandline toolsExperiment analysis models are still being concretized Expect experience from concrete analysis challenges

    D.Duellmann, CERN

  • Why a Relational Abstraction Layer (RAL)?Goal: Vendor independence for the relational components of POOL, ConditionsDB and user codeContinuation of the component architecture as defined in the LCG BlueprintFile catalog, collections and object storage run against all available RDBMS plug-insTo reduced code maintenance effortAll RDBMS client components can use all supported back-endsBug fixes can be applied once centrallyTo minimise risk of vendor binding Allows to add new RDBMS flavours later or use them in parallel and are picked up by all RDBMS clientsRDBMS market is still in flux..To address the problem of distributing data in RDBMS of different flavoursCommon mapping of application code to tables simplifies distribution of RDBMS data in a generic application independent way

    D.Duellmann, CERN

  • Database Schema Access and ManipulationDescribing existing and creating new tablesSupport for primary, foreign keys and indicesFormed by one or more table columnsData Manipulation LanguageInsertion, update and deletion of table rowsBulk insertions to minimise database server roundtripsQueriesNested queries involving one or more tablesOrdering and limiting the result setControl of client cache for the result setDatabase cursors scalable iteration through large query resultsRelational Access functionality

    D.Duellmann, CERN

  • Domain DecompositionPure relational data managementProvide technology neutral RDBMS connectivityEncapsulate main differences eg table creation options Direct clients: File catalog, Collections and Object relational mapping

    Object-relational mapping and storageBridges the differences between relational and object world (object identity resolution, object associations)Provide guided object storageDirect client: POOL Relational Storage Service

    POOL Relational Storage Service Adapter implementing the POOL StorageSvc interfaces Direct client: experiment framework

    D.Duellmann, CERN

  • How does this fit into POOL?

    D.Duellmann, CERN

  • Interface and implementation design driven by software requirement document Co-authored by main users and POOL developersSimple key-value pair interface (AttributeList) used for the handling and the description of the relational dataClean standard C++ interfaceNo special SQL types exposed for data elementsType converter responsible for default and user-defined type conversion between C++ and SQL data typesCan take advantage of vendor specific SQL type extensionsExposed SQL fragments are used only in SQL WHERE clausesMost non standard SQL extensions (eg in create table) are well encapsulated Relational Access Layer Design

    D.Duellmann, CERN

  • Oracle 9i/10gBased on OCISupports Oracle instant client Fully supports the POOL RAL interfacesAvailable for the Linux platforms (win32 will follow)SQLiteA light-weight embeddable SQL database engineFile-based (zero configuration, administration)Available for the Linux and Win32 platformsMySQLImplementation based on the MyODBC driverPrototype released with POOL 1.8RDBMS plug-ins in POOL

    D.Duellmann, CERN

  • Object to Relational MappingHow to map classes tables ?Both C++ and SQL allow to describe data layoutBut with very different constraints/aimsno single unique mappingNeed for fast object navigation an unique Object identity (persistent address)requires unique index for addressable objectspart of mapping definitionPOOL stores mapping with the object dataincluding mapping versions

    D.Duellmann, CERN

  • A Mapping Exampleclass A { int x; float y; std::vector v; class B { int i; std::string s; } b;};

    D.Duellmann, CERN

  • A Mapping Example

    D.Duellmann, CERN

  • Mapping ElementsA complete mapping consists of A mapping version per objectA hierarchical tree of mapping elements per version

    Each mapping element containsElement type (Object, Primitive, Array, POOL reference, Pointer)Database table and column namesC++ member name and typeLower level associated mapping elements

    POOL stores these persistently in 3 (hidden) relational tables

    D.Duellmann, CERN

  • Generating a Mapping..Two use cases need to be supportedStarting from existing table schema and dataGive access to RDBMS data with minimal changes to existing dataPOOL generates default header and mapping from the DB schema

Recommended

View more >