Upload
tzach-livyatan
View
9.309
Download
2
Tags:
Embed Size (px)
Citation preview
Cloudius Systems presents:
SeastarAvi Kivity, April 13 2015
● New tech, runs on physical machines, VMs,Linux/OSv● Multi-million IOPS, fully scalable● Perfect building block for database/filesystem/cache● Share-nothing, fully asynchronous model● Open Source
SeaStar Technology
SeaStar current performance
SeaStarBefore: Thread model After: SeaStar shards
Problem with today’s programing model+ Single core performance (frequency, IPC) no
longer growing+ #core grows but it’s hard to utilize. Apps don’t
scale+ Locks have costs even w/o contention+ Data is allocated on one core, copied and used on
others+ Software can’t keep up with the recent hardware
(SSD, line rate for 10Gbps, NUMA, etc)
Kernel
Application
TCP/IPScheduler
queuequeuequeuequeuequeuethreads
NICQueues
Kernel
Traditional stack
Memory
SeaStar FrameworkLinear scaling by #core
+ Each engine is executed by each core+ Shared-nothing per-core design
+ Fits existing shared-nothing distributed applications model
+ Full kernel bypass, supports zero-copy+ No threads, no context switch and no locks
+ Instead, asynchronous lambda invocation
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Kernel
SeaStar Framework Comparison
Application
TCP/IPScheduler
queuequeuequeuequeuequeuethreads
NICQueues
Kernel
Traditional stack SeaStar’s sharded stack
Memory
Lock contention
Cache contention
NUMA unfriendlyApplication
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
Application
TCP/IP
Task Schedulerqueuequeuequeuequeuequeuesmp queue
NICQueue
DPDK
Kernel (isn’t
involved)
Userspace
No contention
Linear scaling
NUMA friendly
SeaStar handles 1,000,000s connections in parallel!
Traditional stack SeaStar’s sharded stack
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise
Task
Promise
Task
Promise
Task
Promise
Task
CPU
Promise is a pointer to eventually computed value
Task is a pointer to a lambda function
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Scheduler
CPU
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread
Stack
Thread is a function pointer
Stack is a byte array from 64k to megabytes
Context switch cost is
high. Large stacks
pollutes the caches
No sharing, millions
of parallel events
SeaStar current performance Stock TCP stack SeaStar’s native TCP stack
Basic model
■ Futures■ Promises■ Continuations
F-P-C defined: Future
A future is a result of a computation that may not be available yet. ■ Data buffer from the network■ Timer expiration ■ Completion of a disk write■ Result computation that requires the values from one or
more other futures.
F-P-C defined: Promise
A promise is an object or function that provides you with a future, with the expectation that it will fulfil the future.
Basic future/promisefuture<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
put(value + 1).then([] {
std::cout << "value stored successfully\n";
});
});
}
Chainingfuture<int> get(); // promises an int will be produced eventually
future<> put(int) // promises to store an int
void f() {
get().then([] (int value) {
return put(value + 1);
}).then([] {
std::cout << "value stored successfully\n";
});
}
Zero copy friendly
future<temporary_buffer>
connected_socket::read(size_t n);
■ temporary_buffer points at driver-provided pages if possible
■ discarded after use
Zero copy friendly (2)
future<size_t>
connected_socket::write(temporary_buffer);
■ Future becomes ready when TCP window allows sending more data (usually immediately)
■ temporary_buffer discarded after data is ACKed■ can call delete[] or decrement a reference count
Dual Networking StackNetworking API
Seastar (native) Stack POSIX (hosted) stack
Linux kernel (sockets)
User-space TCP/IP
Interface layer
DPDKVirtio Xen
igb ixgb
Disk I/O
■ Zero copy using Linux AIO and O_DIRECT■ Some operations using worker threads (open()
etc.)■ Plans for direct NVMe support
Rich APIs
● HTTP Server● HTTP Client● RPC client/server● map_reduce● parallel_for_each
● distributed<>● when_all()● timers
More info
■ http://github.com/cloudius-systems/seastar■ http://seastar-project.com
Thank you@CloudiusSystems