27
Supercomputing 2009 SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness for Paradigm with Semantic-Awareness for Next-Generation File Systems Yu Hua Hong Jiang Yifeng Zhu Dan Feng Lei Tian 1

SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Supercomputing 2009

SmartStore: A New Metadata Organization Paradigm with Semantic-Awareness forParadigm with Semantic-Awareness for

Next-Generation File Systems

Yu Hua Hong Jiang Yifeng Zhu

Dan Feng Lei Tian

1

Page 2: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

OutlineOutline Motivations

SmartStore System

Key Issues

Performance Evaluation

Discussion and Conclusion

2

Page 3: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

MotivationsMotivations

Some FactsStorage capacity → Exabyte (or even larger)Storage capacity → Exabyte (or even larger)Amounts of Files → BillionsMetadata-based transactions → over 50%Hierarchical directory tree → Performance Bottleneck

Inefficiency of current file systemsInefficiency of current file systemsStatic and inflexible I/O interfacesLinearly brute-force searchingL k f f ll tili ti f tiLack of full utilization of semantics

3

Page 4: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Conventional Directory TreesConventional Directory Trees

Millions of files under each directoryy

This tree is too FAT ! This tree is too HIGH !

4

Page 5: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Ideal ScenariosIdeal ScenariosUser requirements

Quickly return queried results with acceptable tradeoffQ y q p ff

Obtain interested knowledge from data ocean to guide higher-level serviceshigher level services

Query for high-dimensional data

System requirementsScalabilityyReliabilityPerformance improvements

5

Page 6: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

IntuitionIntuition

Reduce search space

Not entire large-scale file system

Search correlated metadata

Configure a context related to queries

Desirable interfaces

S h d k i l i

6

Such as range query and top-k query, i.e., complex queries;

Page 7: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Examples: Complex Queries

7

Page 8: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Our Approach: SmartStoreOur Approach: SmartStore

Basic ideas:S ti l ti t d b ltiSemantic: correlation represented by multi-dimensional attributes of file metadata

Group files based on metadata semantic correlationsby using Latent Semantic Indexing (LSI) tool

Query and other relevant operations can be completed within one or a small number of such groups.

Our goal is to avoid or minimize brute-force search that is widely used in a directory-tree based file system during a complex query

8

system during a complex query.

Page 9: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Comparisons with Conventional File SystemsComparisons with Conventional File Systems

9

Page 10: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Grouping ProceduresGrouping Procedures

Node Vector

Page 11: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Semantic GroupingSemantic GroupingDesign Objectives

Group sizes are approximately equal.

A file in a group has a higher correlation with other files in this group than with any file outside of the groupg p y f f g p

11

Page 12: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

System Architecture

Grouping correlated p gmetadata into storage and index units based on the LSI

Insertion Range Query

Point Query

Construction of semantic R-trees in a distributed

Deletion Top-K NN Query

environmentMultiple operations

Semantic Grouping

Latent Semantic Indexing

12

Page 13: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Constructing a Semantic R-tree.

Semantic R-tree leaf nodes as storage unitsThe non-leaf nodes as index units

MBR representation for local metadata

13

Page 14: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

SmartStore functionsSmartStore functionsInsertion

Deletion

On-line Query Approaches

Range Query

Top-K Query

Point Query

14

Point Query

Page 15: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Key issues: on-line & off-lineKey issues: on line & off lineAccelerate queries

Off-line pre-processing

Each storage unit locally maintains a replica of the semantic vectors of all first-level index units to speed up the queries

L d ti t d l ith i f ti t lLazy updating to deal with information staleness

15

Page 16: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Key Issues: on-line vs off-line

Matching? Query : Forward

(4) if fail, continue to forward

Matching? Query : Forward

Query : Forward

Q y

Page 17: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Key Issues: Consistency Guarantee via VersioningKey Issues: Consistency Guarantee via Versioning

Multi-replica technique can potentially lead to i f ti t l d i i tinformation staleness and inconsistency.Lazy Versioning:

A newly created version attached to its correlatedA newly created version attached to its correlated replica temporarily contains aggregated real-time changes that have not been directly updated in the original replicasg p

SmartStore removes attached versions when reconfiguring index unitsreconfiguring index units

The frequency of reconfiguration depends on the user requirements and environment constraints

17

requirements and environment constraints

Page 18: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Key issues: Mapping of Index UnitsKey issues: Mapping of Index UnitsOur mapping is based on a simple bottom-up approach that iteratively applies random selection and labelingthat iteratively applies random selection and labeling operations.

18

Page 19: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Performance EvaluationPerformance EvaluationPrototype Implementation

Large file system-level traces, including HP , MSN, and EECS by using Trace Intensifying Factory g fy g

Compared with typical DBMS and R-treep yp

Query latency reduction: 1000 times

Space savings: 20 times

19

Page 20: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Complex Queries LatencyComplex Queries Latency

20

Page 21: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Preliminary Simulation Results

( ) ( )T q A qrecall =

I • T(q) is the ideal answer for query qA( ) i th t l lt( )

recallT q

= • A(q) is the actual query results

T 8 NN QTop‐8 NN Query Range Query

21

Page 22: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

On-line & off-lineOn line & off line

700 180

(ms)

400

500

600

700HP(on-line)MSN(on-line)EECS(on-line)

HP(off-line)MSN(off-line)EECS(off-line)

ber (

1000

)

120

150

180HP(on-line)MSN(on-line)EECS(on-line)

HP(off-line)MSN(off-line)EECS(off-line)

Late

ncy

100

200

300

400

ssag

e N

um

30

60

90

Number of Data Nodes20 30 40 50 60

0

100

Number of Data NodesM

es

20 30 40 50 600

30

22

Page 23: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

DiscussionDiscussionSmartStore does work for:

Pay only once: configuration efficiency for a long timePay-only-once: configuration efficiency for a long time due to complexity for semantic analysis;Rich semantics of multi-dimensional attributes to fguarantee the groups to match access patterns well

SmartStore does not efficiently work for:Lack of semantics, such as uniform distribution;Q i k d d i l i f iQuick and dynamic evolution of semantics;Explicit scatter of dimension increments;

23

Page 24: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Potential ApplicationsPotential ApplicationsUsers’ views

Range query and top-k query

System views

De-duplication

CachingCaching

Pre-fetching

24

Page 25: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Conclusions

SmartStore is a new paradigm for organizing file metadata for next-generation file systems

Exploit file semantics

C l iComplex queries

Enhance system scalability and functionality.

Methodology

S ti tiSemantic aggregation

Decrease search space

25

Page 26: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Acknowledgement Acknowledgement This work is partially supported by NSFC under Grant 60703046NSFC under Grant 60703046

National Basic Research 973 Program under Grant 2004CB318201

NSF CCF 0621526 NSF CCF 0937993 NSF CCF 0937988 andNSF CCF-0621526, NSF CCF-0937993, NSF CCF-0937988 and NSF CCF-0621493

HUST-SRF No.2007Q021B

The Program for Changjiang Scholars and Innovative Research

26

The Program for Changjiang Scholars and Innovative Research Team in University No. IRT-0725.

Page 27: SmartStore: A New Metadata Organization Paradigm with …web.eece.maine.edu/~zhu/papers/SmartStore-SC09-slides.pdf · 2009-12-29 · Supercomputing 2009 SmartStore: A New Metadata

Thanks & Questions

27