Introduction - UZHfiles.ifi.uzh.ch/dbtg/ddbs/HS16/sl01-2x2.pdf · Distributed Database Systems Fall 2018 Introduction SL01 I Syllabus and Course Project I Data Independence and Distributed

Distributed Database SystemsFall 2018

IntroductionSL01

I Syllabus and Course ProjectI Data Independence and Distributed ComputingI Definition of Distributed DatabasesI Promises of Distributed DatabasesI Technical Problems to be StudiedI Architectural Models

DDBS18, SL01 1/69 M. Böhlen

This Course/1

Syllabus1. Introduction2. Distributed Database Design3. Distributed Query Processing4. Distributed Query Optimization5. Distributed Transaction Management


This Course/2

I Location: BIN 2.A.01I Time: Monday 14:00 - 15:45I Course page http://www.ifi.uzh.ch/dbtgI I assume you have followed an introductory database course before

(relational model; algebra; SQL; query processing; etc).I The course is based on the textbook Principles of Distributed

Database Systems from Tamer Öszu and Patrick Valduriez,Springer, 3rd edition, 2010.


Two Generals’ Problem/1

I Task: Generals A and B must negotiate a time to attack.I Rules:

I A and B must attack at the same timeI A and B synchronize through messengersI Messengers can get lost

source: Zoltan Fern, Stanford University, cs347 notes



I How many messages are needed, so that both know for sure when toattack?




I We must relax rules to get a solution.I Probalistic approach with retransmissions.


I What is the probability that B attacks at time t?I B attacks at t if he receives message to do so no later than t.I Each message delivered with probability p.I One message per time unit.



I a(1) = pI a(2) = p + (1 − p)pI a(3) = p + (1 − p)p + (1 − p)2pI a(4) = p + (1 − p)p + (1 − p)2p + (1 − p)3p

0

0.2

0.4

0.6

0.8

1

1 1.5 2 2.5 3 3.5 4 4.5 5

time

Probability of successful transmission

0.90.80.70.5


Data Processing Systems/1

Two seminal contributions:

I S. Ghemawat, H. Gobioff, S. Leung, The Google file system,SOSP 2003: A scalable, distributed, fault-tolerant file systemtailored for data-intensive applications, running on inexpensivecommodity hardware

I J. Dean, S. Ghemawat, MapReduce: Simplified Data Processingon Large Clusters, OSDI 2004: A simple programming model andan implementation for processing large data sets on computingclusters.


Data Processing Systems/2Google File System (GFS):I Master manages metadataI Files broken into chunks (typically 64MB)I Chunks are replicated across three machinery for fault-toleranceI Data transfer happens directly between clients and chunk servers.


Data Processing Systems/3MapReduce: hides details of distributed programmingI Computations are expressed with two functions: Map and Reduce.I The Map function produces a set of intermediate key/value pairs.I The MapReduce framework groups all key/value pairs with the same

key and passes them to the Reduce function.I The Reduce function receives an intermediate key with its set of

associated values and merges them.

source: Sherif Sakr, IfI summer school 2016


Data Processing Systems/4Hadoop (= HDFS + MapReduce + ...)I open-source SW stack for data-intensive distributed applicationsI processes very large amount of unstructured and complex data.I runs on a large number of machines that don’t share memory/disks.I clones the Google’s GFS and MapReduce framework.





I Hive: SQL over Apache HadoopI Impala: open source, native analytic database for Apache HadoopI Tajo: distributed data warehouse system for Apache Hadoop.I Giraph: iterative graph processing system built for high scalabilityI Hama: analytics using the Bulk Synchronous Parallel (BSP) modelI Mahout: an environment for scalable machine learning applications


Distributed DBS versus MapReduce/1(MapReduce: A major step backwards [David DeWitt, 2008])

I MapReduce is a giant step backward in the programming paradigmfor large-scale data intensive applications becauseI Schemas are good.I Separation of the schema from the application is good.I High-level access languages are good.

I MapReduce is a sub-optimal implementation, in that it uses bruteforce instead of indexing.I MapReduce has no indexes and therefore has only brute force as a

processing option.I MapReduce systems deal poorly with skew.I Reducers that run multiple reduce tasks simultaneously induce many

disk seeks.


Distributed DBS versus MapReduce/2(MapReduce: A major step backwards [David DeWitt, 2008])

I MapReduce misses most of the features that are routinely includedin current DBMS.I Bulk loader to transform input data in files into a desired format and

load it into a DBMSI Indexing as noted aboveI Updates to change the data in the data baseI Transactions to support parallel update and recovery from failures

during updateI Integrity constraints to help keep garbage out of the data baseI Referential integrity to help keep garbage out of the data baseI Views so the schema can change without having to rewrite the

application program



I Thus, MapReduce is not the ulimate cure to all your datamanagement problems.

I Todays data processing systems are domain-specific, optimized andvertically focused systems.




I Apache Spark is a fast, general engine for large scale dataprocessing on a computing cluster (new engine for Hadoop).

I Developed initially at UC Berkeley in Scala.I RDD (Resilient Distributed Dataset), an in-memory data

abstraction, is the fundamental unit of data in Spark.I Resilient: if data in memory is lost, it can be recreated from the

lineage (how it was computed).I Distributed: stored in memory across the cluster.I Dataset: data can come from a file or be created programmatically.I Spark programming consists of performing operations (e.g., Map,

Filter) on RDDs.



I Apache Flink is a distributed in-memory data processing frameworkwhich represents an alternative to the MapReduce framework thatsupports both of batch and realtime processing.

I Flink has originated from the Stratosphere research project that wasstarted at the Technical University of Berlin.

I Instead of the map and reduce abstractions, Flink uses a directedgraph approach that leverages in-memory storage for improving theperformance of the runtime execution.

I True streaming capabilities: Execute everything as streamsI Native iterative executionI DataSet API, DataStream API, Table API



I Pregel from Google is the first BSP (Bulk Synchronous Parallel)based implementation for graph processing

I Communication through message passing (usually sent alongoutgoing edges from each vertex) + Shared-Nothing graphprogramming.

I Vertex-centric computation, each vertex:I Receives messages sent in the previous superstepI Executes the same user-defined functionI Modifies its valueI If active, sends messages to other vertices (next superstep)I Votes to halt if it has no further work to do





Database Processing Systems/11

I Column Stores: MonetDB, Greenplum, C-Store (Vertica)I Non-relational Databases: BigTable, Cassandra, HBase, SpannerI Document/XML Stores: MongoDB, CouchDB, BaseX,I Key-value Stores: Berkeley DB, NoSQL, Tokyo CabinetI Graph Databases: Neo4J, InfiniteGraph, Virtuoso,I Object Databases: GemStone, db4o, ZODB,I Cluster Computing: SparkI Graph Processing: Pregel, Giraph, GraphLab, GraphXI Event Processing: Storm, Flink, S4


Database Courses

I Datenbankpraktikum,A. Geppert, Tuesday 16.15-18.00, FS

I Distributed Database Systems (DDBS),M. Böhlen, Monday 14.00-15.45, HS even years

I XML and Databases (TSDM),C. Türker, Thursday 8.00-9.45, FS

I Data Warehousing (DW),A. Geppert, Tuesday 16.15-18.00, FS even years

I Temporal and Spatial Data Management (TSDM),M. Böhlen, Monday 14.00-15.45, HS odd years


Course Project/1

I hear and forgetI learn and rememberI do and understand

I Goal: in-depth understanding of a selected part of the courseI Task: Implement a solution from a research paperI Outcome: Running implementation and 5-10 page report


Course Project/2

Choose one of the following papers:I O. Polychroniou, R. Sen, and K. A. Ross, Track Join: Distributed

Joins with Minimal Network Traffic, ACM SIGMOD InternationalConference on Management of Data, pp. 1483-1494, 2014.

I C. Mohan, B. Lindsay, and R. Obermarck, TransactionManagement in the R* Distributed Database ManagementSystem, ACM Transactions on Database Systems, pages 378-396,1986.

I L. Michael, W. Nejdl, O. Papapetrou, W. Siberski, Improvingdistributed join efficiency with extended bloom filteroperations, 21st International Conference on Advanced InformationNetworking and Applications (AINA 2007), pp. 187-194, 2007.


Course Project/3

I Course project shall be solved in group of 2 people.I 4th week, October 8:

I Presentation of problem in terms of a numeric exampleI Plan for implementation (components, SW, functionality)I Plan for evaluation

I 9th week, November 12:I Presentation of solution (problem definition and solution)I Strong and weak points of implementationI Demonstration of implementation

I 14th week, December 17:I hand in of project report and implementation (source code, data)I send pdf by email to Kevin Wellenzohn ([email protected])


Schedule

17.09 Introduction to Distributed Database Systems24.09 Distributed Database Design01.10 Distributed Database Design08.10 Course project: plan for implementation and example15.10 Distributed Database Design22.10 Semantic Data Control29.10 Query Processing05.11 Distributed Query Optimization12.11 Course project: demonstration of implementation19.11 Distributed Query Optimization26.11 Distributed Transactions and Concurrency03.12 Distributed Transactions and Concurrency10.12 Distributed Transactions and Concurrency17.12 Conclusion


Logistics

I There is an oral exam at the end of the course.I The exam takes place January 7, 2019 starting at 14:00.I The oral exam lasts half an hour in total.I There will be questions related to your project (implementation and

understanding).I There will be a question about a selected topic (drawn randomly)

from the course.I It is important that you illustrate the answer to your question

through an appropriately chosen example.I Grade: 50% project, 50% course


Data Independence/1

I In the old days, programs stored data in regular filesI Each program has to maintain its own data

I huge overheadI error-prone


Data Independence/2

I The development of DBMSs helped to fully achieve dataindependence.

I Provide centralized and controlled data maintenance and access.I Application is immune to physical and logical file organization.


Data Independence/3

I Distributed database system is the union of what appear to be twodiametrically opposed approaches to data processing: databasesystems and computer networkI Computer networks promote a mode of work that goes against

centralizationI Key issues to understand this combination

I The most important objective of DB technology is integration notcentralization

I Integration is possible without centralization, i.e., integration ofdatabases and networking does not mean centralization

I Goal of distributed database systems: achieve data integration anddata distribution transparency


Distributed Computing

I A distributed computing system is a collection of autonomousprocessing elements that are interconnected by a computer network.

I The elements cooperate in order to perform the assigned task.I The term “distributed” is used very broadly. The exact meaning of

the word depends on the context. What can be distributed?I Processing logicI FunctionsI DataI Control

I Classification of distributed systems with respect to various criteriaI Degree of coupling, i.e., how closely the processing elements are

connected; e.g., measured as ratio of amount of data exchanged toamount of local processing; weak coupling, strong coupling

I Interconnection structure; e.g., point-to-point connection betweenprocessing elements, common interconnection channel

I Synchronization; synchronous, asynchronous


Definition of DDB and DDBMS/1

I A distributed database (DDB) is a collection of multiple, logicallyinterrelated databases distributed over a computer network

I A distributed database management system (DDBMS) is thesoftware that manages the DDB and provides an access mechanismthat makes this distribution transparent to the users

I DDBS (distributed database system) = DDB + DDBMSI Assumptions

I Data stored at a number of sites each site logically consists of asingle processor

I Processors at different sites are interconnected by a computer network(we do not consider multiprocessors in DDBS, cf. parallel systems)

I DDB is a database, not a collection of files. Data is logically related;structured into multiple files; accessed through common interface.

I DDBMS is a collection of DBMSs.


Definition of DDB and DDBMS/2


Definition of DDB and DDBMS/3I Example: Database consists of 3 relations employees,

projects, and assignment which are partitioned and stored atdifferent sites (fragmentation).

I We study the problems with queries, transactions, concurrency, andreliability.


What is not a DDBS?

I The following systems are different from (though related to)distributed DB systems

Shared Memory Shared Disk

Shared Nothing Central Databases


Promises of DDBSs

Distributed Database Systems provide the following advantages:I Transparency of distributed and replicated dataI Higher reliabilityI Improved performanceI Easier system expansion


Promises of DDBSs – Transparency/1 01.5

TransparencyI Refers to the separation of the higher-level semantics of the system

from the lower-level implementation issuesI A transparent system “hides” implementation details from users.I A fully transparent DBMS provides high-level support for the

development of complex applications.

(a) User wants to see one databa-se

(b) Programmer sees many databases


Promises of DDBSs – Transparency/2

I The goal of transparency is to provide a system where users do nothave to worry about representation, fragmentation, location, andreplication of data.

I Various forms of transparency can be distinguished for DDBSs:I Network transparency (also called distribution transparency)

I Location transparencyI Naming transparency

I Replication transparencyI Fragmentation transparency

I Horizontal fragmentationI Vertical fragmentation



I Network (or distribution) transparency allows a user to perceivea DDBS as a single, logical entity

I The user is protected from the operational details of the network (oreven does not know about the existence of the network)

I Two types of network transparency can be identified: locationtransparency and naming transparency.

I The user does not need to know the location of data items and acommand used to perform a task is independent from the location ofthe data and the site the task is performed (location transparency)

I A unique name is provided for each object in the database (namingtransparency)I In absence of this, users are required to embed the location name as

part of an identifier



Common (suboptimal) ways to get naming transparency at theapplication level:I Solution 1: Create a central name server; however, this results in

I loss of some local autonomyI central site may become a bottleneckI low availability (if the central site fails remaining sites cannot create

new objects)

I Solution 2: Prefix object with identifier of site that created itI e.g., branch created at site S1 might be named S1.BRANCHI Also need to identify each fragment and its copiesI e.g., copy 2 of fragment 3 of Branch created at site S1 might be

referred to as S1.BRANCH.F3.C2I Difficult for userI Relocation of objects becomes difficult


Promises of DDBSs – Transparency/5

I Replication transparency ensures that the user is not involved inthe managment of copies of some data

I The user should even not be aware about the existence of replicas,rather should work as if there exists a single copy of the data

I Replication of data is needed for various reasonsI e.g., increased efficiency for read-only data access



I Fragmentation transparency ensures that the user is not aware ofand is not involved in the fragmentation of the data

I The user is not involved in finding query processing strategies overfragments or formulating queries over fragmentsI The evaluation of a query that is specified over an entire relation but

now has to be performed on top of the fragments requires anappropriate query evaluation strategy

I Fragmentation is commonly done for reasons of performance,availability, and reliability

I Two fragmentation alternativesI Horizontal fragmentation: divide a relation into a subsets of tuplesI Vertical fragmentation: divide a relation by columns


Promises of DDBSs – Reliability

Higher reliabilityI Replication of components and data should make DDBMS more

reliable.I No single points of failureI e.g., a broken communication link or processing element does not

bring down the entire systemI Distributed transaction processing guarantees the consistency of the

database and concurrency


Promises of DDBSs – Performance

Improved performanceI Proximity of data to its points of use

I Reduces remote access delaysI Requires some support for fragmentation and replication

I Parallelism in executionI Inter-query parallelismI Intra-query parallelism

I Update and read-only queries influence the design of DDBSssubstantiallyI If mostly read-only access is required, as much as possible of the data

should be replicatedI Writing becomes more complicated with replicated data


Promises of DDBSs – Extendability

Easier system expansionI Issue is database scalingI Emergence of microprocessor and workstation technologies

I Network of workstations much cheaper than a single mainframecomputer.

I Automatic data sharing has become cheap (coordination involvinghumans is expensive).

I Increasing database size (a central system might not even befeasible).


Technical Problems to be Studied/1

I Distributed database designI How to fragment the data?I Partitioned data vs. replicated data?

I Distributed query processingI Design algorithms that analyze queries and convert them into a series

of data manipulation operationsI Distribution of data, communication costs, etc. has to be consideredI Find optimal query plans


Technical Problems to be Studied/2 01.3

I Distributed concurrency controlI Synchronization of concurrent accesses such that the integrity of the

DB is maintainedI Integrity of multiple copies of (parts of) the DB have to be

considered (mutual consistency)

I ReliabilityI How to make the system resilient to failuresI Atomicity and durability

I Many techniques and solutions for DDBMSs build on standardDBMS solutions and extend these to distributed settings.


Architecture

I Architecture: The architecture of a system defines its structure:I the components of the system are identified;I the function of each component is specified;I the interrelationships and interactions among the components are

defined.I Applies both for computer systems as well as for software systems,

e.g,I division into modules, description of modules, etc.I architecture of a computer

I There is a close relationship between the architecture of a system,standardisation efforts, and a reference model.


ANSI/SPARC Architecture of DBMSI ANSI/SPARC architecture is based on dataI 3 views of data: external view, conceptual view, internal viewI Defines a total of 43 interfaces


Example/1I Conceptual schema: Provides enterprise view of entire database

RELATION EMP [KEY = {ENO}ATTRIBUTES = {

ENO : CHARACTER(9)ENAME: CHARACTER(15)TITLE: CHARACTER(10)

}]

RELATION PAY [KEY = {TITLE}ATTRIBUTES = {

TITLE: CHARACTER(10)SAL : NUMERIC(6)

}]

RELATION PROJ [KEY = {PNO}ATTRIBUTES = {

PNO : CHARACTER(7)PNAME : CHARACTER(20)BUDGET: NUMERIC(7)LOC : CHARACTER(15)

}]

RELATION ASG [KEY = {ENO,PNO}ATTRIBUTES = {

ENO : CHARACTER(9)PNO : CHARACTER(7)RESP: CHARACTER(10)DUR : NUMERIC(3)

}]


Example/2

I Internal schema: Describes the storage details of the relations.I Relation EMP is stored on an indexed fileI Index is defined on the key attribute ENO and is called EMINXI A HEADER field is used that might contain flags (delete, update,

etc.)

INTERNAL_REL EMPL [INDEX ON E# CALL EMINXFIELD = {HEADER: BYTE(1)E# : BYTE(9)ENAME : BYTE(15)TIT : BYTE(10)

}]

Conceptual schema:RELATION EMP [KEY = {ENO}ATTRIBUTES = {ENO : CHARACTER(9)ENAME: CHARACTER(15)TITLE: CHARACTER(10)

}]


Example/3

I External view: Specifies the view of different users/applications

I Application 1: Calculates the payroll payments for engineers

CREATE VIEW PAYROLL (ENO, ENAME, SAL) ASSELECT EMP.ENO,EMP.ENAME,PAY.SALFROM EMP, PAYWHERE EMP.TITLE = PAY.TITLE

I Application 2: Produces a report on the budget of each project

CREATE VIEW BUDGET(PNAME, BUD) ASSELECT PNAME, BUDGETFROM PROJ


Architectural Models for DDBSs/1I Architectural Models for DDBSs (or more generally for multiple

DBMSs) can be classified along three dimensions:I AutonomyI DistributionI Heterogeneity


Architectural Models for DDBSs/2

I Autonomy: Refers to the distribution of control (not of data) andindicates the degree to which individual DBMSs can operateindependently.I Tight integration: a single-image of the entire database is available to

any user who wants to share the information (which may reside inmultiple DBs); realized such that one data manager is in control ofthe processing of each user request.

I Semiautonomous systems: individual DBMSs can operateindependently, but have decided to participate in a federation tomake some of their local data sharable.

I Total isolation: the individual systems are stand-alone DBMSs, whichknow neither of the existence of other DBMSs nor how to comunicatewith them; there is no global control.



I Autonomy has different dimensionsI Design autonomy: each individual DBMS is free to use the data

models and transaction management techniques that it prefers.I Communication autonomy: each individual DBMS is free to decide

what information to provide to the other DBMSsI Execution autonomy: each individual DBMS can execture the

transactions that are submitted to it in any way that it wants to.



I Distribution: Refers to the physical distribution of data overmultiple sites.I No distribution: No distribution of data at allI Client/Server distribution:

I Data are concentrated on the server, while clients provide applicationenvironment/user interface

I First attempt to distributionI Peer-to-peer distribution (also called full distribution):

I No distinction between client and server machineI Each machine has full DBMS functionality


Architectural Models for DDBSs/5 01.6

I Heterogeneity: Refers to heterogeneity of the components atvarious levelsI hardwareI communicationsI operating systemI DB components (e.g., data model, query language, transaction

management algorithms)


Client-Server Architecture/1

I General idea is to divide the functionality of the database systeminto two classes:I server functions: mainly data management, including query

processing, optimization, transaction management, etc.I client functions: might also include some data management functions

(consistency checking, transaction management, etc.) not just userinterface

I Client and server refer to actual machines (i.e., different machineswhere the SW runs).

I Leads to a more efficient division of the work.I Yields a two-tier architecture (a further distribution of the

functionality leads to n-tier architectures with application servers).I Different types of client/server architecture

I Multiple client/single serverI Multiple client/multiple server


Client-Server Architecture/2

I One server, many clientsI Similar to centralized DBMSsI differences are execution of

transactions and cache management.


Client-Server Architecture/3I Database server, application serversI General application server; heavier clients


Client-Server Architecture/401.7

I Many database and application serversI Specialized application servers; lighter clients


Peer-to-Peer Architecture/1I Data organizational view (architecture with equal peers).I Provides transparency since it is an extension of the ANSI/SPARC

architecture.I Local internal schema (LIS)

I Describes the local physical data organization (which often is differenton each machine)

I Local conceptual schema (LCS)I Describes logical data organiza-

tion at each siteI Required since the data are

fragmented and replicatedI Global conceptual schema (GCS)

I Describes the global logical viewof the data

I Union of the LCSsI External schema (ES)

I Describes the user/application view of the dataDDBS18, SL01 61/69 M. Böhlen

Peer-to-Peer Architecture/2

I Detailed components of adistributed peer-to-peerdatabase systems.

I User and data processor arepresent on each machine (this isorganizational separation; notfunctional separation as forclient/server)

I User processor handlesinteractions withusers/applications.

I Data processor handles storageand data processing.


Multi-DBMS Architecture/1

I Individual DBMSs are fully autonomous and have no concept ofcooperation.

I “Data integration system” is an often used term for such systems.I Fundamental difference to peer-to-peer DBMS is in the definition of

the global conceptual schema (GCS)I In a MDBMS the GCS represents only the collection of some of the

local databases that each local DBMS want to share.I The GCS is a (proper) subset of the union of the LCSs (no complete

view exists)I This leads to the question, whether the GCS should even exist in a

MDBMS.I Two different architecutre models:

I Models with a GCSI Models without GCS


Multi-DBMS Architecture/2I Model with a GCS

I GCS is the union of parts of the LCSsI Local DBMS define their own views on the local DB


Multi-DBMS Architecture/3I Model without a GCS

I The local DBMSs present to the multi-database layer the part of theirlocal DB they are willing to share.

I External views are defined on top of LCSs


Multi-DBMS Architecture/4I Full-fledged DBMSs, each managing a different database.I SW layer on top that allows to access various databases.I For individual DBMSs the SW layer is simply yet another application.I Implemented through mediator/wrapper technology.


Conclusion/1

I A distributed database (DDB) is a collection of logically interrelateddatabases distributed over a computer network

I Data is stored at a number of sites, the sites are connected by anetwork.

I DDBS supports the relational model. DDBS is not a remote filesystem.

I Transparent system ‘hides’ implementation details from usersI Distribution/network transparencyI Replication transparencyI Fragmentation transparency

I Higher reliability; improved performance; easier system expansionI Programming a distributed database involves:

I Distributed database designI Distributed query processingI Distributed concurrency control


Conclusion/2

I Architecture defines the structure of the system. There are differentways to define the architecture: e.g., based on components or data

I DDBS might be based on identical components (homogeneoussystems) or different components (heterogeneous systems)

I ANSI/SPARC architecture defines external, conceptual, and internalschemas

I There are three orthogonal implementation dimensions for DDBS:level of distribution, autonomity, and heterogeinity

I Different architectures are discussed:I Client-Server SystemsI Peer-to-Peer SystemsI Multi-DBMS


Course project, October 10

I October 10, 14:00 - end:I no lectureI group project presentations of about 20 minutes in my office (BIN

2.E.13)I Structure of presentations:

I Presentation of the problem with a numeric exampleI Plan for implementation (components, HW, SW, functionality)I (plan for evaluation)

I October 3: Send email to [email protected] with the followinginformation: group members, emails, possible constraints for theOctober 10 meeting

I A plan with the exact slots for the presentations will be distributed(web page).


Documents

Introduction - UZHfiles.ifi.uzh.ch/dbtg/ddbs/HS16/sl01-2x2.pdf · Distributed Database Systems Fall 2018 Introduction SL01 I Syllabus and Course Project I Data Independence and Distributed