23
HipHop Virtual Machine by Radu Murzea

HipHop Virtual Machine

Embed Size (px)

DESCRIPTION

Presentation about the HipHop Virtual Machine given at Pentalog's headquarters in Cluj-Napoca on the 20th February 2014.

Citation preview

Page 1: HipHop Virtual Machine

HipHop Virtual Machine by Radu Murzea

Page 2: HipHop Virtual Machine

AgendaIntroduction

What is HipHop VM ?History and why it exists

Architecture and FeaturesGeneral ArchitectureCode cacheJITGarbage CollectorAdminServerFastCGIExtensionsHHVM-friendly PHP codeParity

Page 3: HipHop Virtual Machine

What is HipHop VM ?

High-Level Stack-Based virtual machine that executes PHP code

Created by Facebook in a (successful) attempt to reduce load on their servers

New versions are released every 8 weeks on Thursday. 10 days before a release, the branch is cut and heavily tested.

Page 4: HipHop Virtual Machine

History of HHVM (I)

Summer 2007: Facebook started developing HPHPc, an PHP to C++ translator.

It worked by:Building an AST based on the PHP codeBased on that AST, equivalent C++ code was

generatedThe C++ code was compiled to binary using g++The binary was uploaded to the webservers where

it was executedThis resulted in significant performance

improvements, up to 500% in some cases compared to PHP 5.2

Page 5: HipHop Virtual Machine

History of HHVM (II)

The succes of HPHPc was so great, that the engineers decided to give it a developer-friendly brother: HPHPi

HPHPi was just like HPHPc but it ran in interpreted mode only (a.k.a. much slower)

However, it provided a lot of utilities for developers:Debugger (known as HPHPd)Setting watches, breakpointsStatic code analysisPerformance profiling

It also didn’t require the compilation step to run the codeHPHPc ran over 90 % of FB production code by the end

of 2009HPHPc was open-sourced on February 2010

Page 6: HipHop Virtual Machine

History of HHVM (III)

But good performance came at a cost:Static compilation was very cumbersomeThe binary had 1 GB which was a problem since production code

had to be pushed to the servers DAILYMaintaining compatibility between HPHPc and HPHPi was

getting more and more difficult (they used different formats for their ASTs)

So, at the beginning of 2010, FB started developing HHVM, which was a better, longer-term solution

At first, HHVM replaced only HPHPi, while HPHPc remained in production

But now, all FBs production servers are run by HHVMFB claims a 3x to 10x speed boost and 0.5x – 5x memory

reduction compared to PHP + APC. This, of course, is on their own code, most applications will have a more modest improvement

Page 7: HipHop Virtual Machine

General Architecture (I)

General architecture is made up of:2 webserversA translatorA JIT compilerA Garbage Collector

HHVM doesn’t support any OS:It supports most flavours of LinuxIt has some support for Mac OS X (only runs with JIT

turned off)There is no Windows supportThe OS must have a 64-bit architecture in order for HHVM

to work

Page 8: HipHop Virtual Machine

General Architecture (II)

The HHVM will follow the following steps to execute a PHP script:Based on PHP code, build an AST (implementation for

this was reused from HPHPc)Based on the AST, build Hip Hop Bytecode (HHBC),

similar to Java’s or CLR’s bytecodeCache the HHBCAt runtime, pass the HHBC through the JIT compliler

(if enabled) which will transform it to machine codeExecute the machine code or, if JIT is disabled,

execute the HHBC in interpreted mode (not as fast, but still faster than Zend PHP)

Page 9: HipHop Virtual Machine

Code Cache (I)

When request comes in, HHVM determines which file to serve up, then checks if the file’s HHBC is in SQLite-based cacheIf yes, it’s executedIf no, HHVM compiles it, optimizes it and stores it

in cacheThis is very similar to APCThere’s a warm-up period when new server is

created, because cache is emptyHowever, HHVM’s cache lives on disk, so it

survives server restarts and there will be no more warm-up periods for that file

Page 10: HipHop Virtual Machine

Code Cache (II)

But warm-up period can be bypassed by doing pre-analysis

Pre-analysis means the cache can be generated before HHVM starts-up

Pre-analyser will actually work a little harder and will do a better job at optimizing code

Page 11: HipHop Virtual Machine

Code Cache (III)

There is a mode called RepoAuthoritative modeHHVM will check at each request if the PHP file

changed in order to know if cache must be updated

RepoAuthoritative mode means this check is not performed anymore.

But be careful because, if the file is not in cache, you’ll get a HTTP 404 error, even though the PHP file is right there

RepoAuthoritative is recommended for production because it avoides a lot of disk IO and files change rarely anyway

Page 12: HipHop Virtual Machine

JIT Compiler

Just-in-Time compilation is done during execution, not before

It translates an intermediate form of code (in this case HHBC) to machine code

A JIT compiler will constantly check to see which paths of code are executed more frequently and try to optimize those as best as possible

Since a JIT compiler will compile to machine code at runtime, the resulting machine code will be optimized for that platform or CPU, which will sometimes make it faster than even static compilation

Page 13: HipHop Virtual Machine

JIT Compiler (II)

HHVM uses so called tracelets as basic unit block of JITA tracelet is usually a loop because most programs

spend most of their time in some “hot loops” and subsequent iterations of those loops take similar paths

A tracelet has 3 parts:Type guard(s): prevents execution for incompatible typesBodyLink to subsequent tracelet(s)

Each tracelet has great freedom, but it is required to restore the VM to a consistent state any time execution escapes

Tracelets have only ONE execution path, which means no control flow, which they’re easy to optimize

Page 14: HipHop Virtual Machine

Garbage Collector

Most modern languages have automatic memory management

In the case of VMs, this is called Garbage CollectorThere are 2 major types of GCs:

Refcounting: for each object, there is a count that constantly keeps track of how many references point to it

Tracing: periodically, during execution, the GC scans each object and determines if it’s reachable. If not, it deletes it

Tracing is easier to implement and more efficient, but PHP requires refcounting, so HHVM uses refcounting

FB engineers want to move to a tracing approach and they might get it done someday

Page 15: HipHop Virtual Machine

AdminServer

HHVM will actually start 2 webservers: Regular one on port 80AdminServer on the port you specify

It can be accessed at an URI like http://localhost:9191/check-health?auth=mypasshaha

The AdminServer can turn JIT on/off, show statistics about traffic, queries, memcache, CPU load, number of active threads and many more

Page 16: HipHop Virtual Machine

FastCGI

HHVM supports FastCGI starting with version 2.3.0 (released in December 2013)

FastCGI is a communication protocol used by webservers to communicate with other applications

The support for FastCGI means we don’t have to use HHVM’s poor webserver, but instead use something like Apache or nginx and let HHVM do what it does best: execute PHP code at lightning speed

Supporting FastCGI will make HHVM enter even more production systems and increase its popularity

Page 17: HipHop Virtual Machine

Extensions

HHVM supports extensions just like PHP doesThey can be written in PHP, C++ or a combination of

the 2Extensions will be loaded at each request, you don’t

have to keep loading an extension all over your applications

To use custom extensions, you add it to the extensions and then recompile HHVM. The resulting binary will contain your extension and you can then use it

By default, HHVM already contains the most popular extensions, like MySQL, PDO, DOM, cURL, PHAR, SimpleXML, JSON, memcache and many others

Though, it doesn’t include MySQLi at this time

Page 18: HipHop Virtual Machine

HHVM-friendly Code (I)

Write code that HHVM can understand without running, code that contains as much static detail as possible

Avoid things like:Dynamic function call: $function_name()Dynamic variable name: $a = $$x + 1;Functions like compact(), get_defined_vars(), extract() etcDon't access dynamic properties of an object. If you want

to access it, declare it. Accessing dynamic properties must use hashtable lookups, which are much slower.

Where possible, provide:Type hinting in function parametersReturn type of functions should be as obvious as possible:

return ($x == 4); or like: return ($boolVar ? 1 : -1);

Page 19: HipHop Virtual Machine

HHVM-friendly Code (II)

Code that runs in global scope is never JIT-ed.Any code anywhere can mutate the variables in the global

scope. So, since PHP is weak-typed, it makes it impossible for the JIT compiler to predict a variable’s type

Example:class B { public function __toString() { $GLOBALS['a'] = 'Hello, world !'; }}$a = 5;$b = new B;echo $b;

Page 20: HipHop Virtual Machine

Parity (I)

All this is great, but can HHVM actually run real-world code ? Well, in December 2013, it looked like this (taken from HHVM blog):

Page 21: HipHop Virtual Machine

Parity (II)

HHVM’s engineers main goal is to be able to run all PHP frameworks by Q4 2014 or Q1 2015.

Page 22: HipHop Virtual Machine

Q & A

Page 23: HipHop Virtual Machine