4
CASE STUDY VoltDB and the 5G Revoluon Michael Stonebraker VoltDB is the commercializaon of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can think of them as having very similar architectures, and we will lump them together in this paper using the mnemonic V-H. The design of V-H, as conceived in 2004, stressed maximum performance in terms of large-scale transacons per second (TPS) at predictably low latency in milliseconds. The raonale was that disk-based DBMSs would increase in performance with the advent of faster secondary storage such as SSDs and NVRAM. Hence, RAM-based DBMSs must be architected to achieve ultra-high performance, in order to ensure a significant performance advantage going forward relave to tradional systems. Therefore V-H made the following three important choices: A focus on single-paron transacons A mul-node main-memory DBMS must paron data across the various nodes. Mul-node transacons inevitably involve an expensive distributed concurrency control protocol. As noted in [Harding], distributed concurrency control slows down operaon considerably. If there is significant contenon between transacons, then throughput can be even more dramacally reduced. To avoid this overhead, V-H focused on opmizing so-called single-paron transacons. In this case, an applicaon designer is expected to organize his data so that almost all transacons do not span data on mulple nodes. Many applicaons are naturally “single part” such as updang the balance of a single subscriber and checking his plan policy for authorizing a phone-call. In other words, any paroning of subscriber accounts to mulple nodes will make the above transacons span only one paron. On the other hand, moving money from one account to another can usually not be made single-part because there is oſten no way to cluster both accounts in a single paron. In summary, many applicaons can be made single-part but some cannot be. Also, several very large applicaons insist that all transacons be single part. Hence, they forbid mul-part transacons as an applicaon architecture best pracce so as to maximize performance. V-H chose to opmize for “single part’ transacons, which it can execute with very high performance. Although V-H is happy to execute “mul-part” transacons, they are not at comparable performance. One should use VoltDB on workloads that are predominantly single part.

VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

CASE STUDY

VoltDB and the 5G RevolutionMichael Stonebraker

VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can think of them as having very similar architectures, and we will lump them together in this paper using the mnemonic V-H. The design of V-H, as conceived in 2004, stressed maximum performance in terms of large-scale transactions per second (TPS) at predictably low latency in milliseconds. The rationale was that disk-based DBMSs would increase in performance with the advent of faster secondary storage such as SSDs and NVRAM. Hence, RAM-based DBMSs must be architected to achieve ultra-high performance, in order to ensure a significant performance advantage going forward relative to traditional systems.

Therefore V-H made the following three important choices:

A focus on single-partition transactions

A multi-node main-memory DBMS must partition data across the various nodes. Multi-node transactions inevitably involve an expensive distributed concurrency control protocol. As noted in [Harding], distributed concurrency control slows down operation considerably. If there is significant contention between transactions, then throughput can be even more dramatically reduced. To avoid this overhead, V-H focused on optimizing so-called single-partition transactions.

In this case, an application designer is expected to organize his data so that almost all transactions do not span data on multiple nodes. Many applications are naturally “single part” such as updating the balance of a single subscriber and checking his plan policy for authorizing a phone-call. In other words, any partitioning of subscriber accounts to multiple nodes will make the above transactions span only one partition. On the other hand, moving money from one account to another can usually not be made single-part because there is often no way to cluster both accounts in a single partition.

In summary, many applications can be made single-part but some cannot be. Also, several very large applications insist that all transactions be single part. Hence, they forbid multi-part transactions as an application architecture best practice so as to maximize performance.

V-H chose to optimize for “single part’ transactions, which it can execute with very high performance. Although V-H is happy to execute “multi-part” transactions, they are not at comparable performance. One should use VoltDB on workloads that are predominantly single part.

Page 2: VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

CASE STUDY

VOLTDB AND THE 5G REVOLUTION 2

A focus on stored procedures

Most OLTP use cases contain primarily repetitive transactions. Hence, there are a few high-volume transactions types. Executing these using ODBC/JDBC is a very bad idea, because executing a single transaction requires several round trips in the protocol. The same drawback applies to NoSQL applications with record-at-a-time interfaces, where values are retrieved one-by-one to the client application to process. Not only does this create multiple client-server round trips, but it also puts unnecessary pressure on the network bandwidth usage. Instead one should use a stored procedure interface. In this case, the code for the transaction (a mix of Java and SQL) is moved into the DBMS, where it can be executed with a single round-trip message. When Sybase introduced stored procedures in the mid-1980s they offered a factor of 5 or so performance advantage, compared to an ODBC/JDBC interface.

As a result, V-H focused on making a stored procedure interface operate at very high performance.

A focus on active-active replication

Essentially all OLTP applications require high availability (HA). This requires every object to be replicated multiple times, and a crash requires the system to fail over to a backup. During normal processing, V-H must ensure that every transaction is processed at all copies or at none. If this is accomplished, then V-H can “fail over” as the result of a crash and continue operation, without data corruption.

There are two plausible tactics for performing replica updates:

Active-active replication. A transaction is executed at all replicas and committed locally at all nodes. In this case all replicas are “active” and transaction processing occurs at every site with a replica. For example, AT&T will have east and west coast customers talking to the nearest cluster with ac-tive-active replication syncing in the background.

Active-passive replication. One copy is designated the primary copy and every transaction is executed there first. Log records are written at this site and then moved over the network to the backup sites. At each backup, the log is rolled forward to bring the secondary into synchronization with the primary.

Given that there are two plausible strategies, which one should be chosen? Several years ago, [Malvaiya] implemented single-replica crash recovery in VoltDB. He compared two tactics:

1. Writing a command log and re-running the the commands when recovering

2. Writing a data log and rolling the log forward at recovery time.

He found the command log had negligible overhead during execution and was therefore a great deal faster than the data logging approach. [Yu] extended this code to replication and implemented both active-active and active-passive replication. He found that active-active was the performance winner by almost a factor of two.

Page 3: VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

CASE STUDY

VOLTDB AND THE 5G REVOLUTION 3

As a result, VoltDB focused on active-active replication, which requires a deterministic concurrency control strategy, which V-H serendipitously uses. In contrast, most of V-H competitors use a concurrency control strategy (e.g. dynamic locking, optimistic concurrency control, multi-version concurrency control) that is not deterministic. Hence, active-active is not an option for those systems, and the factor of two speed advantage is not available.

In aggregate, these three decisions allow V-H to execute transactions an order of magnitude (or more) faster than other main-memory DBMSs. In benchmarks ([Somagani], [Acme]), V-H has been shown to run 1M transactions per second on reasonable size clusters. So far, this is faster than any customers workload we know of. Hence, V-H competitors could possibly run these workloads, albeit, with an order of magnitude more hardware.

However, this situation is about to change dramatically. The forcing function is 5G.

5G promises higher bandwidth leading to higher density (up to one million devices per sq. km.) and milli-second latency. This density of devices is forcing new Radio Access Network (RAN) cell technology to avoid saturating existing networks. In turn, this will exponentially increase the number of database TPS in order to:

• Update the state change information for every device in the network

• Exercise the real-time authentication and authorization policy for every new device communication

In addition to this, network slicing is a 5G requirement with a portion of the network dedicated for each use case, such as: Industrial IoT, Video, VR etc. Each needs instantaneous decision making for load balancing and quality of service assurance for the increased number of subscribers (people + IoT devices).

To demonstrate the increased TPS, let us consider the example of a typical wireless carrier: A modest sized one might support 10M phones; a larger one might have 150M. A typical wireless carrier has numerous high-volume transactional applications. Here are a few examples:

Billing and Charging: Current networks bill in increments of 6 seconds, i.e. 10 times per minute. If the average phone has a duty cycle of 10%, then a small network has 6M billing events per minute, or 100,000 per second. For larger networks, the number is much higher. Over time, the number of IoT devices is expected to at least quadruple this volume. As such, billing will be a very high TPS application with uncompromising consistency requirement at millisecond latency.

New services: IoT devices are expected to continue to enable new services in a 5G world. These will include medic alert applications that connect to emergency personnel when a human being falls or experiences a medic abnormality. With a very high concentration of subscribers in a football stadium, dynamically spinning up a geo-fenced sub-network to handle the spike in connectivity would be possible. Smart metering will enable personalized customer experience and communication, as well as facilitating the analysis of grid consumption, energy demands, and complying with new regulatory requirements. 5G will also allow continuous monitoring and predictive maintenance of IOT sensors in wind and solar farms.

Page 4: VoltDB and the 5G Revolution · VoltDB and the 5G Revolution Michael Stonebraker VoltDB is the commercialization of H-Store, a prototype built at M.I.T. in the mid 2000’s. One can

© VoltDB, Inc. 209 Burlington Road, Suite 203, Bedford, MA 01730 voltdb.com

About VoltDB

VoltDB powers applications that require real-time intelligent decisions on streaming data for a connected world, without compromising on ACID requirements. No other database can fuel applications that require a combination of speed, scale, volume and accuracy.

Architected by the 2014 A.M. Turing Award winner, Dr. Mike Stonebraker, VoltDB is a ground-up redesign of the relational database for today’s growing real-time operations and machine learning challenges. Dr. Stonebraker has conducted research on database technologies for more than 40 years, leading to numerous innovations in fast data, streaming data and in-memory databases. With VoltDB, he realized the full potential of tapping streaming data with in-memory transactional database technology that can handle data’s speed and volume while delivering real-time analytics and decision making. VoltDB is a trusted name in the industry already validated by leading organizations like: Nokia, Financial Times, Mitsubishi Electric, HPE, Barclays, Huawei, and more.

27 December 2018

In summary, 5G will enable such applications which will drive transaction rates through the roof.

In addition, most wireless applications (e.g. Billing) are single partition transactions and are in the VoltDB sweet spot. For these kinds of applications, we expect even modest size wireless carriers will have to support millions of transactions per second.

This kind of volume cannot be supported by most main memory DBMSs. The exception is VoltDB, which has been already shown to scale to these kinds of numbers. If you have a KTPS applications (thousands of transactions per second) then there are many solutions to your problem. If you anticipate MTPS (millions of transactions per second) then check out VoltDB.

References

[Harding] http://www.vldb.org/pvldb/vol10/p553-harding.pdf

[Malvaiya] http://hstore.cs.brown.edu/papers/voltdb-recovery.pdf

[Yu] http://www.cs.cmu.edu/~pavlo/blog/2013/12/fall-2013-research.html

[Somagani] https://www.voltdb.com/blog/2018/11/06/benchmarking-voltdb-on-the-cloud/

[Acme] https://www.voltdb.com/blog/2015/11/17/comparing-cloud-performance-ycsb/