Writing a fast HTTP parser

Preview:

DESCRIPTION

At Lisp Meetup #22

Citation preview

Writing a fast HTTP parser

Lisp Meetup #22 Eitaro Fukamachi

Thank you for coming.

I’m Eitaro Fukamachi @nitro_idiot fukamachi

(and 'web-application-developer 'common-lisper)

We’re hiring! Tell @Rudolph_Miller.

fast-http

• HTTP request/response parser

• Written in portable Common Lisp

• Fast

• Chunked body parser

fast-http

Benchmarked with SBCL 1.2.5 / GCC v6.0.0

Let me tell why I had to write a fast HTTP parser.

Wookie is slower than Node.js

• Wookie is 2 times slower than Node.js

• Profiling result was saying “WOOKIE:READ-DATA” was pretty slow.

• It was only calling “http-parse”.

• “http-parse” which is an HTTP parser Wookie is using.

The bottleneck was HTTP parsing.

Wookie is slower than Node.js

• Node.js’s HTTP parse is “http-parser”.

• Written in C.

• General version of Nginx’s HTTP parser.

• Is it possible to beat it with Common Lisp?

Today, I’m talking what I did for writing a fast Common Lisp program.

5 important things

• Architecture

• Reducing memory allocation

• Choosing the right data types

• Benchmark & Profile

• Type declarations

5 important things

• Architecture

• Reducing memory allocation

• Choosing the right data types

• Benchmark & Profile

• Type declarations

A brief introduction of HTTP

HTTP request look like…

GET /media HTTP/1.1↵ Host: somewrite.jp↵ Connection: keep-alive↵ Accept: */*↵

HTTP request look like…

GET /media HTTP/1.1↵ Host: somewrite.jp↵ Connection: keep-alive↵ Accept: */*↵

First Line

Headers

Body (empty, in this case)

HTTP request look like…

GET /media HTTP/1.1↵ Host: somewrite.jp↵ Connection: keep-alive↵ Accept: */*↵

↵ CR + LF

CRLF * 2 at the end of headers

HTTP response look like…

HTTP/1.1 200 OK↵ Cache-Control: max-age=0↵ Content-Type: text/html↵ Date: Wed, 26 Nov 2014 04:52:55 GMT↵

↵ <html> …

HTTP response look like…

HTTP/1.1 200 OK↵ Cache-Control: max-age=0↵ Content-Type: text/html↵ Date: Wed, 26 Nov 2014 04:52:55 GMT↵

↵ <html> …

Status Line

Headers

Body

HTTP is…

• Text-based protocol. (not binary)

• Lines terminated with CRLF

• Very lenient.

• Ignore multiple spaces

• Allow continuous header values

And, there’s another difficulty.

HTTP messages are sent over a network.

Which means, we need to think about long & incomplete HTTP messages.

There’s 2 ways to resolve this problem.

1. Stateful (http-parser)

http-parser (used in Node.js)

• https://github.com/joyent/http-parser

• Written in C

• Ported from Nginx’s HTTP parser

• Written as Node.js’s HTTP parser

• Stateful

http-parser (used in Node.js)for (p=data; p != data + len; p++) { … switch (parser->state) { case s_dead: … case s_start_req_or_res: … case s_res_or_resp_H: … } }

http-parser (used in Node.js)for (p=data; p != data + len; p++) { … switch (parser->state) { case s_dead: … case s_start_req_or_res: … case s_res_or_resp_H: … } }

Process char by char

Do something for each state

2. Stateless (PicoHTTPParser)

PicoHTTPParser (used in H2O)

• https://github.com/h2o/picohttpparser

• Written in C

• Stateless

• Reparse when the data is incomplete

• Most HTTP request is small

And fast-http is…

fast-http is in the middle

• Not track state for every character

• Set state for every line

• It makes the program simple

• And easy to optimize

5 important things

• Architecture

• Reducing memory allocation

• Choosing the right data types

• Benchmark & Profile

• Type declarations

Memory allocation is slow

• (in general)

• Make sure not to allocate memory during processing

• cons, make-instance, make-array…

• subseq, append, copy-seq

5 important things

• Architecture

• Reducing memory allocation

• Choosing the right data types

• Benchmark & Profile

• Type declarations

Data types

• Wrong data type makes your program slow.

• List or Vector

• Hash Table or Structure or Class

5 important things

• Architecture

• Reducing memory allocation

• Choosing the right data types

• Benchmark & Profile

• Type declarations

Benchmark is quite important

• “Don’t guess, measure!”

• Check if your changes improve the performance.

• Benchmarking also keeps your motivation.

Profiling

• SBCL has builtin profiler

• (sb-profile:profile “FAST-HTTP” …)

• (sb-profile:report)

5 important things

• Architecture

• Reducing memory allocation

• Choosing the right data types

• Benchmark & Profile

• Type declarations

Type declaration

• Common Lisp has type declaration (optional)

• (declare (type <type> <variable symbol>))

• It’s a hint for your Lisp compiler

• (declare (optimize (speed 3) (safety 0)))

• It’s your wish to your Lisp compilerSee also: Cより高速なCommon Lispコードを書く

(safety 0)

• (safety 0) means “don’t check the type & array index in run-time”.

• Fast & unsafe (like C)

• Is fixnum enough?

• What do you do when someone passes a bignum to the function?

(safety 0)

• fast-http has 2 layers

• Low-level API

• (speed 3) (safety 0)

• High-level API (safer)

• Check the variable type

• (speed 3) (safety 2)

Attitude

Attitude

• Write carefully.

• It’s possible to beat C program

• (if the program is complicated enough)

• Don’t give up easily

• Safety is more important than speed

Thanks.

EITARO FUKAMACHI 8arrow.org @nitro_idiot fukamachi