Distributed Database Concepts

Embed Size (px)

Citation preview

  • 8/8/2019 Distributed Database Concepts

    1/4

    Distributed Database ConceptsDistributed databases bring the advantages of distributed computing to the database management domain. Adistributed computing system consists of a number of processing elements, not necessarily homogeneous, that areinterconnected by a computer network, and that cooperate in performing certain assigned tasks. As a general goal,

    distributed computing systems partition a big, unmanageable problem into smaller pieces and solve it efficiently in acoordinated manner. The economic viability of this approach stems from two reasons: (1) more computer power isharnessed to solve a complex task, and (2) each autonomous processing element can be managed independently anddevelop its own applications.We can define a distributed database (DDB) as a collection of multiple logically interrelated databases distributedover a computer network, and a distributed database management system (DDBMS) as a software system thatmanages a distributed database while making the distribution transparent to the user

    Homogeneous and Heterogeneous Databases

    In a homogeneous distributed database , all sites have identical database managementsystem software, are aware of one another, and agree to cooperate in processingusers requests. In such a system, local sites surrender a portion of their autonomyin terms of their right to change schemas or database management system software.

    In contrast, in a heterogeneous distributed database , different sites may use differentschemas, and different database management system software. The sites maynot be aware of one another, and they may provide only limited facilities for cooperationin transaction processing.

    P arallel Versus Distributed TechnologyTurning our attention to system architectures, there are two main types of multiprocessor system architectures thatare commonplace:

    Shared memory (tightly coupled) architecture: M ultiple processors share secondary (disk) storage and alsoshare primary memory.

    Shared disk (loosely coupled) architecture: M ultiple processors share secondary (disk) storage but each hastheir own primary memory.

    These architectures enable processors to communicate without the overhead of exchanging messages over anetwork (Note 3). Database management systems developed using the above types of architectures are termedp arallel database management systems rather than DDBMS, since they utilize parallel processor technology.Another type of multiprocessor architecture is called shared nothing architecture. In this architecture, everyprocessor has its own primary and secondary (disk) memory, no common memory exists, and the processors

    communicate over a high-speed interconnection network (bus or switch).

    A dvantages of Distributed DatabasesDistributed database management has been proposed for various reasons ranging from organizationaldecentralization and economical processing to greater autonomy. We highlight some of these advantages here.

  • 8/8/2019 Distributed Database Concepts

    2/4

    1. M anagement of distributed data with different levels of transparency: Ideally, a DB MS should bedistribution transparent in the sense of hiding the details of where each file (table, relation) is physicallystored within the system.. The following types of transparencies are possible:

    o D istribution or network transparency: This refers to freedom for the user from the operational detailsof the network. It may be divided into location transparency and naming transparency. L ocation

    transparency refers to the fact that the command used to perform a task is independent of thelocation of data and the location of the system where the command was issued. Namingtransparency implies that once a name is specified, the named objects can be accessedunambiguously without additional specification.

    o R eplication transparency: As copies of data may be stored at multiple sites for better availability, performance, and reliability. Replication transparency makes the user unaware of the existence of copies.

    o F ragmentation transparency: Two types of fragmentation are possible. H orizontal fragmentationdistributes a relation into sets of tuples (rows). Vertical fragmentation distributes a relation intosubrelations where each subrelation is defined by a subset of the columns of the original relation.

    2. I ncreased reliability and availability: These are two of the most common potential advantages cited for distributed databases. R eliability is broadly defined as the probability that a system is running (not down)at a certain time point, whereas availability is the probability that the system is continuously availableduring a time interval.

    3. I mproved performance: A distributed DB MS fragments the database by keeping the data closer to where it isneeded most. Data localization reduces the contention for CPU and I/O services and simultaneouslyreduces access delays involved in wide area networks.

    4. E asier expansion: In a distributed environment, expansion of the system in terms of adding more data,increasing database sizes, or adding more processors is much easier.

    P ro p erties of Distributed Databases :Data Fragmentation:

  • 8/8/2019 Distributed Database Concepts

    3/4

    In a DDB, decisions must be made regarding which site should be used to store which portions of the database. For now, we will assume that there is no replication; that is, each relationor portion of a relationis to be stored atonly one site

    H orizontal Fragmentation

    A horizontal fragment of a relation is a subset of the tuples in that relation. The tuples that belong to the horizontalfragment are specified by a condition on one or more attributes of the relation. H orizontal fragmentation divides arelation "horizontally" by grouping rows to create subsets of tuples, where each subset has a certain logical meaning.These fragments can then be assigned to different sites in the distributed system.

    Vertical Fragmentation

    E ach site may not need all the attributes of a relation, which would indicate the need for a different type of fragmentation. Vertical fragmentation divides a relation "vertically" by columns. A vertical fragment of a relationkeeps only certain attributes of the relation. For example, we may want to fragment the EMPLOYEE relation intotwo vertical fragments. The first fragment includes personal information NAME , BD AT E , ADDRESS , andSEX and the second includes work-related information SS N, S AL ARY , SUPERSS N, D NO .

    Mixed (Hybrid) Fragmentation

    We can intermix the two types of fragmentation, yielding a mixed fragmentation. For example, we maycombine the horizontal and vertical fragmentations of the EMPLOYEE relation given earlier into a mixedfragmentation that includes six fragments

    A fragmentation schema of a database is a definition of a set of fragments that includes all attributes andtuples in the database and satisfies the condition that the whole database can be reconstructed from the fragments byapplying some sequence of OUTER UNION (or OUTER JOIN) and UNION operations. It is also sometimesusefulalthough not necessaryto have all the fragments be disjoint except for the repetition of primary keysamong vertical (or mixed) fragments

    An allocation schema describes the allocation of fragments to sites of the DDBS; hence, it is a mapping thatspecifies for each fragment the site(s) at which it is stored. If a fragment is stored at more than one site, it is said tobe re p licated. We discuss data replication and allocation next.

    Data Re p lication and Allocation

  • 8/8/2019 Distributed Database Concepts

    4/4

    Replication is useful in improving the availability of data. The most extreme case is replication of the wholedatabase at every site in the distributed system, thus creating a fully replicated distributed database. This canimprove availability remarkably because the system can continue to operate as long as at least one site is up. It alsoimproves performance of retrieval for global queries, because the result of such a query can be obtained locally fromany one site.

    The other extreme from full replication involves having no replication that is, each fragment is stored at exactlyone site. In this case all fragments must be disjoint, except for the repetition of primary keys among vertical (or mixed) fragments. This is also called nonredundant allocation.

    Between these two extremes, we have a wide spectrum of partial replication of the datathat is, some fragmentsof the database may be replicated whereas others may not. The number of copies of each fragment can range fromone up to the total number of sites in the distributed system. A description of the replication of fragments issometimes called a replication schema.

    Each fragmentor each copy of a fragmentmust be assigned to a particular site in the distributed system. This process is called data distribution (or data allocation ). The choice of sites and the degree of replication depend onthe performance and availability goals of the system and on the types and frequencies of transactions submitted ateach site. For example, if high availability is required and transactions can be submitted at any site and if mosttransactions are retrieval only, a fully replicated database is a good choice

    T ransparency

    The user of a distributed database system should not be required to know either where the data arephysically located or how the data can be accessed at the specific local site. This characteristic, calleddata transparency , can take several forms:

    F ragmentation transparency . U sers are not required to know how a relationhas been fragmented.

    R eplication transparency . U sers view each data object as logically unique.The distributed system mayreplicate an object to increase either system performance or data availability. U sers do not have to be

    concerned with whatdata objects have been replicated, or where replicas have been placed.

    L ocation transparency . U sers are not required to know the physical locationof the data. The distributed database system should be able to find any dataas long as the data identifier is supplied by the user transaction.

    D ata itemssuch as relations, fragments, and replicasmust have unique names.This property is easy to ensure in a centralized database. In a distributed database,however, we must take care to ensure that two sites do not use the same name for distinct data items.

    Read architecture from notes.