Tatyana Matvienko,Senior Java Developer, Big data storages

Preview:

Citation preview

Big Data Storages

Agenda[Big]Data Source: when it becomes Big?What cluster is? Horizontal and vertical scaling[Big]Data Storage challengesDisadvantagesNoSQL = Not only SQLMost popular and trendyTech Example: Apache Cassandra architectureDemo

Big Data Storage ConceptsOnly stores facts (events), doesn’t analyze itImmutableTime series data (based on timestamps and, maybe,

origin)Store everything, delete nothing

Where: Messages (email, twitter), social networks, Sensor data (IoT), Log files, Locations

Cluster. Horizontal and vertical scalingWhat cluster is?Load balancerCommunication:

master/slave architecture

Fault tolerance and replication factor

Size (keep and search huge amount of data)

Speed (data acquisition, data search)

Availability (fault tolerance, partition tolerance)

Big Data Storage Challenges

Disadvantages of Big Data Storages

No transactions (ACID)Less matureBig variety of concepts, lack of standardizationNo BI or analytics in queriesAdministration

Distributed File storage

Amazon

Storages: Key-Value

Examples: Redis, DynamoDB, MemcacheDB, Riak KV, Aerospike, OrientDB

Storages: Document oriented

Examples: Apache CouchDB, Couchbase, MongoDB

Storages: Graphs

Examples: Allegro, Neo4J, OrientDB, Titan

Storages: Column basedExamples: Cassandra, HBase, Accumulo, Vertica

Why Cassandra?

Apache Cassandra: basicsMasterless architecture with read/write anywhere design

All nodes are the same

No single point of failure

Zone support

Linear scalability

CQL - cassandra query language

Availability and Partition Tolerance but Eventual Consistency

Partitioning and Replication

Data modeling

Demo