IT-SDC : Support for Distributed Computing Dynamic Federations: scalable, high performance Grid/Cloud storage federations Fabrizio Furano - Oliver Keeble

Embed Size (px)

DESCRIPTION

17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 3 Aggregation /dir1 /dir1/file1 /dir1/file2 /dir1/file3.../dir1/file1.../dir1/file2 Storage/MD endpoint 1.../dir1/file2.../dir1/file3 Storage/MD endpoint 2 This is What we want to see as users Sites remain independent and participate to a global view All the metadata interactions are hidden and done on the fly NO metadata persistency needed here, just efficiency and parallelism With 2 replicas

Citation preview

IT-SDC : Support for Distributed Computing Dynamic Federations: scalable, high performance Grid/Cloud storage federations Fabrizio Furano - Oliver Keeble - Adrien Devresse 1 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC A project started a few years ago Context: Promote and improve the usage of WebDAV/HTTP for high performance computing in the geographically distributed Grid environment Goal: a frontend that presents what a certain number of remote or local endpoints would present if put together Without indexing them beforehand Emphasis on scalability and flexibility Flexible algorithmic name translations to mount remote endpoints into an apparent namespace Use industry standard building blocks whenever possible These endpoints can be a very broad range of objects that act as data or metadata stores WebDAV and S3 are included 2 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 3 Aggregation /dir1 /dir1/file1 /dir1/file2 /dir1/file3.../dir1/file1.../dir1/file2 Storage/MD endpoint 1.../dir1/file2.../dir1/file3 Storage/MD endpoint 2 This is What we want to see as users Sites remain independent and participate to a global view All the metadata interactions are hidden and done on the fly NO metadata persistency needed here, just efficiency and parallelism With 2 replicas 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 4 Federator Plugin Frontend (Apache2+DMLite) Where is file X ? Plugin SE Metadata cache SE The cache remembers what happened The next metadata interactions will very likely be fed by the cache The 2 nd level cache can be shared among federators (memcached) The cache remembers what happened The next metadata interactions will very likely be fed by the cache The 2 nd level cache can be shared among federators (memcached) 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC HTTP/WebDAV federation For an HTTP/WebDAV client its just a huge, distributed repository to query A solution to the Where is file X? problem high performance (Ks client transactions per sec) and reliability takes realtime redirection choices, considering the worldwide status (instead of a static catalogue) never out of sync with the storage elements status can scale up the size of the repo can scale up the number of clients On top of data/metadata access it also allows to browse the federated apparent namespace Gives a friendly feel to users and sysadmins A solution to the Whats in "directory (=path prefix) Y? problem 5 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC HTTP/WebDAV federation For the sysadmins its a frontend service that contacts a set of remote endpoints Based on Apache + some solibrary plugins Fully C++ code No service-side metadata persistency needed Each endpoint provides WebDAV/HTTP/S3 content Spurred DAVIX: a complete, high performance HTTP/DAV/S3 client library Available in Fedora/EPEL The Federator needs only metadata r/o access Each endpoint is mounted according to a directory prefix 6 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC Dynamic Federations This approach of dynamically mounting is very powerful Opens to a multitude of use cases, by composing a worldwide system from macro building blocks speaking HTTP and/or WebDAV Federate Grid storage Federate WebDAV Cloud services Add the content of fast changing things, like file caches Add native S3 storage backends (a supported dialect) Accommodate whatever metadata sources, even two or more remote catalogues at the same time Clients are redirected to the replica that is closer to them The metric is pluggable, any other metric could be implemented Redirect only to endpoints that are working in that moment 7 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC Look and feel What we see in the browser is an HTML rendering of a listing Everything is done on the fly Click on a file to download it (if your client is authorized by the endpoint SE through X509) Feed the URL of that file to any other client to download it Click on the strange icon to get a metalink A standard representation of the locations of a file sorted by increasing distance from the requestor Its supported by multi-source download apps 8 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 9 Look and feel, like a normal list 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 10 Interesting deployments and use cases 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC The demo frontend Our historical public testbed is a powerful machine at DESY Provided by the dCache team, cooperating with us Hosts several demos E.g. the Interleaved path, containing interleaved files from two sources Now hosts the stable LHCb prototype 11 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 12 Constructing an on the fly namespace /fed /interleaved 2 sites here CERN (odd files) DESY (even files) 2 sites here CERN (odd files) DESY (even files) 14/19 LHCb sites here 15PB online 14/19 LHCb sites here 15PB online /lhcb /XrdHTTP_README Bonus file coming from yet another endpoint placed on /fed Bonus file coming from yet another endpoint placed on /fed 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC LHCb federation prototype The namespace of the storage elements of the LHCb experiment is quite simple and clean Many sites now are deploying WebDAV access Setup was simple, and now 14 sites (~15PB) are stably online It just works GeoIP-based redir optimization is active Official site downtimes were always detected automatically 13 /lhcb/LHCb/Collision12/BHADRONCOMPLETEEVENT.DST/ /0000/ _ _1.bhadroncompleteevent.dst remains constant, despite the prefix it may have, like: https://ccdavlhcb.in2p3.fr:2880/ or https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/ /lhcb/LHCb/Collision12/BHADRONCOMPLETEEVENT.DST/ /0000/ _ _1.bhadroncompleteevent.dst remains constant, despite the prefix it may have, like: https://ccdavlhcb.in2p3.fr:2880/ or https://fly1.grid.sara.nl:2882/pnfs/grid.sara.nl/data/ 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC The BOINC data bridge BOINC lets you contribute computing power on your home PC to projects doing research in many scientific areas The LHC experiments have some interest on it, and dedicated some effort into it. Some challenges were: Seamlessly integrating the Grid storage auth domain with an external user-based auth domain Optimizing and ruggedizing the data access to/from users (slow lines, distant home users, processes put to sleep, ) Many clients, potentially large data bridge needed, scalability The BOINC Data Bridge is basically a Dynamic Federation with: Write to the Data Bridge enabled On-the fly resource location among an undefined number of S3 backends GeoIP optimization of file locations Double-headed authentication X509 (strong) and BOINC (Apache username/pwd) its a bridge! https://indico.cern.ch/event/272793/ https://indico.cern.ch/event/272793/ 14 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 15 The BOINC data bridge Apache ssl FTS S3 mysql CRAB3 (X509) BOINC User (Apache auth) PUT/GET HTTP redirect & sign PUT/GET Grid (X509) DynaFed S3 Any number of S3 instances In any place (prototype at CERN) Redirections will be optimized based on clients location Good files are moved asynchronously to the official Grid repos 17 Nov 2014 HTTP Dynamic Federations IT-SDC NEP-101 NEP-101 is a project to enable data-intensive applications to run on distributed clouds Batch services, Software distribution, Storage Federation, Image Distribution Need to use standard protocols, open-source components, avoid anything HEP-specific Have multiple clouds and SEs in various locations; cloud jobs need to find SEs Planning to use EMI Dynamic Federation Ryan Taylor University of Victoria 16 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 North American Clouds Ryan Taylor University of Victoria 17 Nov 2014 HTTP Dynamic Federations IT-SDC Federation Test Deployment Storage Element Web server More SEs could be added for production deployment (e.g. Melbourne did) 18 Ryan Taylor University of Victoria 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC Different frontends Any application that can rely on a WebDAV namespace can work seamlessly on top of a fed Only caveat: it must support redirections, (which several clients support) Hence, in principle things like OwnCloud could work on a federation of endpoints or S3 buckets A geographically distributed repository We will be willing to try 19 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 20 Federation of Cloud Storages Amazon S3 /atlas/bucket1/file2 /atlas/bucket2/file1 /lhcb/bucket3/file6 Ceph S3 /atlas/specialdata/bucket5 /atlas/bucket2/file1 Openstack Swift /atlas/bucket2/file1 /userBob/kitty.png /atlas/specialdata/bucket5 /atlas/bucket2/file1 /lhcb/bucket3/file2 /userBob/kitty.png /atlas/bucket1/file2 /lhcb/bucket3/file6 DynaFed One namespace for several Cloud Design to scale Fully in memory Metadata caching Geo-Redirection Standard interface 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC 21 Why federating Cloud Storage Extend your existing resoures with Cloud Storage Add/remove resources on demand Inter-Cloud data Replication, huge composite repositories Geo-Redirection Load-balancing / failover Federated Identity Use your own authorization scheme Answer: Combine the advantages of a federation with the flexibility of the Cloud 17 Nov 2014 HTTP Dynamic Federations IT-SDC 17 Nov 2014 HTTP Dynamic Federations IT-SDC Conclusions and next steps Getting close to large prod setups Bridge Web, Grid and Cloud tech with tools that scale Keep the usual metadata DBs for more batch-like use cases, use the dynamic system for realtime data access Make high perf data storage/access scale geographically in the Web+Grid+Cloud case A way to explore user-friendly interfaces for large geographically distributed repos New Dynafed release in one week Add missing endpoints to the LHCb prototype Use the new features to evaluate the ATLAS case (>40 sites, >200 spacetokens [=places where to search]) Support the BOINC data bridge and the Canadian project Get experience with aggregating many S3 endpoints Get experience in managing file caches instead of stable storages Investigate the possible usage in the context of personal file sync tools with very large, distributed repos 22