9
Schema-less databases Really…? In actuality, there is no such thing as a schema- less database In a relational database, the schema is explicit and created separately in advance In column-based database, we create a fresh schema for each row, and in fact, we often reuse schema fragments from rows that are grouped together The same is true for document databases In column-based and also in document databases, users directly query the data based on the schema In graph-based databases, we are in essence building the schema as we build the data Perhaps we could say that a key-value db has no schema, but in truth, the app is must be coded to look for & interpret schematic information

Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Embed Size (px)

Citation preview

Page 1: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Schema-less databasesReally…?

• In actuality, there is no such thing as a schema-less database• In a relational database, the schema is explicit and created separately in

advance• In column-based database, we create a fresh schema for each row, and

in fact, we often reuse schema fragments from rows that are grouped together• The same is true for document databases

• In column-based and also in document databases, users directly query the data based on the schema

• In graph-based databases, we are in essence building the schema as we build the data

• Perhaps we could say that a key-value db has no schema, but in truth, the app is must be coded to look for & interpret schematic information

Page 2: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Schema updates

• In a relational database, it is almost always a big deal to change a schema

• In “schema-less” databases, the idea is to make it as easy as possible, so that we can:• dynamically keep structural information up to date – because

today, this sort of information changes frequently.• keep the database online – but this does not always work, or

we at least have to pull part of it offline.• count on the structural information of other objects to remain

current – because we can surgically control exactly what objects have their schemas changed.

Page 3: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

The schema-less approach & consequences

• The general idea with schema-less databases is:• To treat meta data like data, as much as possible• To allow much more individuality for each object

• Interesting side effects of this idea• The database can hold much more varied forms of data• Data from a schema-less database could be extracted,

interpreted by the application, and then structured and stored in a relational database when necessary

Page 4: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Language-related factors

• 1. In a schema-less database, the boundary between the db and the application is lower, as much of the query/update code is written in a conventional language

• 2. Or, perhaps we could say that the boundary is higher, because much more complex/rich things can be done to the data directly in the database

• But perhaps the deciding factor is that in a schema-less database, we don’t have many the amenities – such as full ACID transactions - that a relational database would have, and so 1 above is closer to the truth.

Page 5: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Problems with schema-less approach• If there is no explicit schema, it can be difficult to know what to

change in the application if some of the data changes format, as code in many places will be doing their own data interpretations

• If updates and queries are written in a general purpose language, it can be harder to isolate the code that needs to be changed within the database-level code• In a relational database, queries are fairly declarative

Page 6: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

The term “migrations”

• This refers to the evolution of schema information during the life-cycle of applications that use it

• In a relational database this is a big deal, but it is explicit• In a schema-less database, we can better support incremental change• The term is also used in MVC-based web development environments to refer

to the indirect creation of schema components during the development of a web app

• Perhaps the best way to look at this term is philosophically – we want to migrate schemas, not operation is an offline-online endless loop

Page 7: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Maintaining backward compatibility

• We could create new objects or new versions of objects in order to be assured that applications can use the database as it was

• In a graph database, we could add new edges but not delete old ones

• In fact, we could view both data and metadata this way, and have an ever-growing database• This is not as absurd as it might sound – for legal and business

reasons, we often need to keep old data• We can push old data off on faraway clusters

Page 8: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Reasons for using a schema

• Encapsulation gives us a structure that can serve as the scope of an operation• We rely on structure as a differentiator so we can reuse data and retarget data

• No structure – bits• Minimal structure – textual documents• Modest structure – relational tables• Medium structure – business objects• High structure – CAD• Extreme structure – photos, video, audio, language

Page 9: Schema-less databases Really…? In actuality, there is no such thing as a schema-less database In a relational database, the schema is explicit and created

Assignment 4

• You will build an application using PostGreSQL and Cassandra• The application will consist of a handful of operations that you will perform on each

database – you can run your operations manually and have no app• PostgreSQL will hold your schema based, tabular data• Cassandra will hold your schema-variable data• There will be two tables in PostgreSQL

• The first holds customers who are buying items• Key for customer, customer names, item purchased for each row (FK of

primary key of second table)• The second will hold the items for purchase

• Key for item, price for item• Cassandra will hold the buying history of each customer

• What items purchased• How many of each item• Price paid all of the instances of a given item – prices can change over time

• This is due at the beginning of class on Feb. 25.