of 38 /38
1 Scalable Distributed Scalable Distributed Data Structures Data Structures Part 2 Part 2 Witold Litwin Witold Litwin Paris 9 Paris 9 [email protected] [email protected]

1 Scalable Distributed Data Structures Part 2 Witold Litwin Paris 9 [email protected]

Embed Size (px)

Text of 1 Scalable Distributed Data Structures Part 2 Witold Litwin Paris 9 [email protected]

  • Slide 1

1 Scalable Distributed Data Structures Part 2 Witold Litwin Paris 9 [email protected] Slide 2 2 High-availability LH* schemes n In a large multicomputer, it is unlikely that all servers are up n Consider the probability that a bucket is up is 99 % bucket is unavailable 3 days per year n One stores every key in 1 bucket case of typical SDDSs, LH* included n Probability that n-bucket file is entirely up is 37 % for n = 100 0 % for n = 1000 Slide 3 3 High-availability LH* schemes n Using 2 buckets to store a key, one may expect : 99 % for n = 100 91 % for n = 1000 n High availability SDDS make sense are the only way to reliable large SDDS files Slide 4 4 High-availability LH* schemes n High-availability LH* schemes keep data available despite server failures any single server failure most of two server failures some catastrophic failures n Three types of schemes are currently known mirroring striping grouping Slide 5 5 High-availability LH* schemes n There are two files called mirrors n Every insert propagates to both splits are nevertheless autonomous n Every search is directed towards one of the mirrors the primary mirror for the corresponding client n If a bucket failure is detected, the spare is produced instantly at some site the storage for failed bucket is reclaimed it is allocated to another bucket when again available Slide 6 6 High-availability LH* schemes n Two types of LH* schemes with mirroring appear n Structurally-alike (SA) mirrors same file parameters keys are presumably at the same buckets n Structurally-dissimilar (SD) mirrors keys are presumably at different buckets loosely coupled = same LH-functions h i minimally coupled = different LH-functions h i Slide 7 7 LH* with mirroring SA-mirrors SD-mirrors Slide 8 8 n SA-mirrors most efficient for access and spare production but max loss in the case of two-bucket failure n Loosely-coupled SD-mirrors less efficient for access and spare production lesser loss of data for a two-bucket failure n Minimally-coupled SD-mirrors least efficient for access and spare production min. loss for a two-bucket failure LH* with mirroring Slide 9 9 Some design issues n Spare production algorithm n Propagating the spare address to the client n Forwarding in the presence of failure n Discussion in : High-Availability LH* Schemes with Mirroring. W. Litwin, M.-A. Neimat. COOPIS-96, Brussels Slide 10 10 LH* with striping (LH* arrays) n High-availability through striping of a record among several LH* files as for RAID (disk array) schemes but scalable to as many sites as needed n Less storage than for LH* with mirroring n But less efficient for insert and search more messaging although using shorter messages Slide 11 11 LH* S Slide 12 12 n Spare segments are produced when failures are detected n Segment file schemes are as for LH* with mirroring SA-segment files + perf. pour l'adressage, mais - perf. pour la scurit SD-segment files n Performance analysis in detail remains to be done LH* S Slide 13 13VariantesVariantes n Segmentation level bit best security meaningless single-site data meaningless content of a message block less CPU time for the segmentation attribute selective access to segments becomes possible fastest non-key search Slide 14 14 LH*gLH*g n Avoids striping to improve non-key search time keeping about the same storage requirements for the file n Uses grouping of k records instead group members remain always in different buckets despite the splits and file growth n Allows for high-availability on demand without restructuring the original LH* file Slide 15 15 LH*gLH*g Primary LH* file Parity LH* file Slide 16 16 LH*gLH*g Evolution of an LH*g file before 1st split (a), and after a few more inserts, (b), (c). Evolution of an LH*g file before 1st split (a), and after a few more inserts, (b), (c). Group size k = 3 Slide 17 17 LH*gLH*g n If a primary or parity bucket fails the hot-spare can always be produced from the group members that are still alive n If more than one group member fails then there is data loss n Unless the parity file has more extensive data e.g. Hamming codes Slide 18 18 Other hash SDDSs n DDH (B. Devine, FODO-94) uses Extensible Hash as the kernel clients have EH images less overflows more forwardings n Breitbart & al (ACM-Sigmod-94) less overflows & better load more forwardings Slide 19 19 RP* schemes n Produce 1-d ordered files for range search n Uses m-ary trees like a B-tree n Efficiently supports range queries LH* also supports range queries but less efficiently n Consists of the family of three schemes RP* N RP* C and RP* S Slide 20 20 RP* schemes Slide 21 21 Slide 22 22 RP* Range Query Termination n Time-out n Deterministic Each server addressed by Q sends back at least its current range The client performs the union U of all results It terminates when U covers Q Slide 23 23 RP*c client image 0 - for 2 in of IAMs 1 of 3 for in Slide 24 24 RP*sRP*s. An RP* file with (a) 2-level kernel, and (b) 3-level kernel s 0 12 3 4 aa and these the these a of that is of in i it for for a a this these to a in 2 of 1 these 4 b a inb c c a in 0 for 3 c * (b) a and the to a of that of is of in i it for 0 123 0 for 3 in 2 of 1 a for a a a (a) Distr. Index root Distr. Index root Distr. Index page Distr. Index page IAM = traversed pages Slide 25 25 RP* N drawback of multicasting Slide 26 26 RP* N RP* C RP* S LH* Slide 27 27 Slide 28 28 Slide 29 29 Research Frontier ? Slide 30 30 Kroll & Widmayer schema (ACM-Sigmod 94) n Provides for 1-d ordered files practical alternative to RP* schemes n Efficiently supports range queries n Uses a paged distributed binary tree can get unbalanced Slide 31 31 k-RP*k-RP* n Provides for multiattribute (k-d) search key search partial match and range search candidate key search Orders of magnitude better search perf. than traditional ones n Uses a paged distributed k-d tree index on the servers partial k-d trees on the clients Slide 32 32 Access performance (case study) n Three queries to a 400 MB, 4GB and a 40 GB file Q1 - A range query, which selects 1% of the file Q2 - Query Q1 and an additional predicate on non-key attributes selecting 0.5% of the records selected by Q1 Q3 - A partial match x0 = c0 successful search in a 3-d file, where x0 is a candidate key n Response time is computed for: a traditional disk file a k-RP* file on a 10 Mb/s net a k-RP* file on a 1 Gb/s net n Factor S is the corresponding speed-up reaches 7 orders of magnitude Slide 33 33 Slide 34 34 dPi-treedPi-tree n Side pointers between the leaves traversed when an addressing error occurs a limit can be set-up to guard against the worst case n Base index at some server n Client images tree built path-by-path from base index or through IAMs called correction messages n Basic performance similar to k-RP* ? Analysis pending Slide 35 35 ConclusionConclusion n Since their inception, in 1993, SDDS were subject to important research effort n In a few years, several schemes appeared with the basic functions of the traditional files hash, primary key ordered, multi-attribute k-d access providing for much faster and larger files confirming inital expectations Slide 36 36 Future work n Deeper analysis formal methods, simulations & experiments n Prototype implementation SDDS protocol (on-going in Paris 9) n New schemes High-Availability & Security R* - trees ? n Killer apps large storage server & object servers object-relational databases Schneider, D & al (COMAD-94) video servers real-time HP scientific data processing Slide 37 37 ENDEND Thank you for your attention Witold Litwin [email protected] [email protected] Slide 38 38