Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”

Perl 6 Internals

Dan SugalskiTPC 5.0

“Here there be dragons”

The big goals of perl 6's internals

Speed Extendibility Cleanliness Compatibility Modularity Thread Safety Flexibility

Some global decisions

The core will be in C. (Like it or not, it's appropriate for code at this level)

The core must be modular, so pieces can be swapped out without rebuilding

It must be fast Long-term binary compatibility is a must Your average perl coder or extension writer

shouldn't need any info about the guts Things should generally be thought out,

documented, and engineered

The quick overview

Parser Compiler Optimizer Runtime engine

Parser Compiler Optimizer Interpreter

SyntaxTree Unoptimized

Bytecode

OptimizedBytecode

Fully-ladenInterpreter

PrecompiledBytecode

The parser

Where the whole thing starts Generally takes source of some sort and turns it

into a syntax tree

The Bytecode Compiler

Turns a syntax tree into bytecode Performs some simple optimization

The optimizer

Takes the plain bytecode from the compiler and abuses it heavily

An optional step, generally skipped for compile-and-go execution

Should be able to work on small parts of a program for JIT optimization

The Interpreter

Takes compiled (and possibly optimized) bytecode and does something with it

Generally that something is execute, but it might also be: Save to disk Translate to another format (.NET, Java bytecode) Compile to machine code

The Parser

“Double, double, toil and troubleFire burn, and cauldron bubble”

Parser goals

Extendible in perl More powerful than what we have now Retargetable Self-contained and removable

Parsing perl isn't easy

May well be one of the toughest languages to properly parse

If we get perl right other languages are easy. Or at least easier

We have the full power of perl to draw on to do the parsing (Including the regex engine and Damian's Bizarre Idea de Jour)

The parser will be in C

We will be using C for the parser A full set of callbacks will be available to hook

into the parser in lots of places Adding new parsing rules (probably with regexes

describing them) will be easy The parser will be extendable via perl code

The Compiler

“Mmmmm, tasty!”

From syntax tree to bytecode

The compiler takes a syntax tree and turns it into bytecode

Very little optimization is done here. Optimization is expensive and optional Pretty straightforward—this isn't rocket science

The Optimizer

“We can rebuild it.Make it better, faster, stronger”

The Optimizer

Takes plain bytecode and makes it faster Does all the sorts of things that you expect an

optimizer to do—code motion, loop unrolling, common subexpression work, etc.

Will be an iterative process This will be interesting, as perl's a pain to

optimize An optional step, of course

Things that make optimizing perl tough

Active data Runtime redefinitions of everything Really, really late binding (Waiting for Godot

late) Perl programmers are used to more predictable

runtime characteristics than, say, C programmers.

The Interpreter

“Polly want a cracker?”

Interpreter goals

Fast Tuned for perl Language neutral where possible Event capable Sandboxable Asynchronous I/O built in Built with an eye towards TIL and/or native code

compilation Better debugging support than perl 5

The perl 6 interpreter is software CPU

Complete with registers and an assembly language

This can make translating perl 6 bytecode into native machine code easier

There's a lot of literature on building optimzing compilers that can be leveraged

While more complex than a pure stack-based machine, it's also faster

Opcode dispatch needs to be faster than perl 5 Opcode functions can be written in perl

CPU specs

64 int, float, string, and PMC registers A segmented multiple stack architecture Interrupt-capable (for events) Pretty much completely position independent—

everything is referenced via register, pad entry, or name

The regex engine

The regex engine is going to be part of the perl 6 CPU, not separate as it is now

A good incentive to get opcode dispatch fast Makes expanding the regex engine a bit easier Details will be hidden as a set of regex opcodes

A few words on the stack system

Each register file has an associated stack All registers of a particular type can be pushed

onto or popped off the stack in one go Individual registers or groups of registers can be

pushed or popped The stacks are all segmented so we're not relying

on finding contiguous chunks of memory for them

There's also a set of call and scratch stacks

Bytecode

“Could you say that a little differently?”

What is bytecode?

A distilled version of a program Machine language for the PVM Can contain a lot of 'extra' information, including

full source Designed to be platform independent Should be mostly mappable as shared data

(modulo the fixup sections)

Data Structures

“Vtables and strings and floats, oh my!”

Variables

Vtable Pointer

Data Pointer

Integer Value

Float Value

Flags

Synchronization

GC Data

Generically called a PMC

Bigger than Perl 5's base data structure

Synchronization data built-in

Same for all variable types

GC data is not part of base structure

Scalars

Built off the base PMC structure Use the integer and float areas as caches Data pointer points off to string, large int, or large

float Vtable functions determine how it all works

Arrays

Built off the base PMC structure Data pointer points to array data All perl 6 arrays are typed May have an array of scalars, strings, integers, or

floats Array only takes up enough memory to hold their

types

Hashes

Built off the base PMC structure Data pointer points to array data All perl 6 hashes are typed May have a hash of scalars, strings, integers, or

floats Hashes only takes up enough memory to hold

their types Hashing function is overridable

Strings

Encoding

Type

Buffer Start

Buffer Length

String Length

String Size

Strings are sort of abstract

Perl 6 can mix and match string data (Unicode, ASCII, EBCDIC, etc)

New string types can be loaded on the fly

Flags

Unused

String handling

Perl 6 has no 'built-in' string support—all string support is via loadable libraries

There'll be Unicode, ASCII, and EBCDIC support provided (at least) to start

Numbers

Bigints and bigfloats share the same header

Arbitrary-length floating point and integer numbers are supported

Perl automagically upgrades ints and floats when needed

Buffer Pointer

Length

Exponent

Flags

Vtables

All variable data access is done through a table of functions that the variable carries around with it

This allows us faster access, since code paths are specialized for just the functions they need to perform

Isolates us from the implementation of variables internally

Allows special purpose behaviour (like perl 5's magic) to be attached without cost to the rest of perl

Vtables (cont'd)

Makes thread safety easier A little bit more overhead because of the extra

level of indirection, but the smaller functions make up for that

Vtable functions can be written in perl. (Each class with objects blessed into it will have at least one)

There may be more than one vtable per package

Vtables hide data manipulation

Pretty much all the code to handle data manipulation will be done via variable vtables

Ths allows the variable implementation to change without perl needing to know

Allows far more flexibility in what you can make a variable do

Shortens the code path for data functions and trims out extraneous conditionals

For example:Fetching the string value of a scalar

For scalars with strings:

String *get_str(PMC *my_PMC) { return my_PMC->data_pointer;}

For int-only scalar:

String *get_str(PMC *my_PMC) { my_PMC->data_pointer = make_string(my_PMC->integer); my_PMC->vtable =

int_and_string_vtable; return my_PMC->data_pointer;}

Memory Management

“Now where did I put that?”

Getting headers

All the fixed-size things (PMCs, string/number headers) get allocated from arenas

All headers, with the exception of PMCs (maybe) are moveable by the garbage collector

Non-PMC header allocation is very fast PMC allocation is only mostly fast

Buffer Management

Anything that isn't a fixed size gets allocated from the buffer pools

All buffered data, with the exception of data allocated in special pools, is moveable by the garbage collector

Because of GC, allocation is very quick

Garbage Collection

“Bring out yer dead!”

The perl 6 GC is a copying collector

Everything except PMCs is moveable in Perl 6 PMCs might be moveable too We get a compact memory heap out of this,

which allows for fast allocation Perl 6 will release empty memory back to the

system when it can Refcounts are used only to note object lifetimes,

not for GC Refcounts, for the most part, are dead

GC considerations for Objects

Garbage collection and object death are now separate things

Perl's guarantee of timely object death is stronger We still don't guarantee perfect collection (but it

sucks less) We still refcount for real perl references, but only

2 bits are used Objects with more than two simultaneous

references won't get collected until a full dead variable scan is made

Extensions beware!

Since we have no refcounts, extensions must tell perl when they hold on to PMCs

Not a huge deal, as we piggy-back on the cross-interpreter PMC tracking we use for threads

No more struct PMC; in extensions...

Extending Perl 6

Extensions Made Easier

Perl 6 will have a real API The API is multilevel

Simple for embedders More complex for extension authors Pretty messy for vtable or opcode writers

Binary compatibility is a very strong consideration

Embedding

Guaranteed stable and binary compatible for the life of perl 6

Very simple API Create interpreter Destroy interpreter Parse source Run code Register native functions

Extensions

Much simpler interface to perl's internals The gory details are hidden Stable binary compatibility is a very strong goal

We may add functions or options, but we won't take them away

Extensions built for perl 6.0.1 should still run with perl 6.8.12 without rebuilding

Manipulating perl data should be much easier If you have to resort to Inline to wrap a library

then it means we've not got it right

Extensions (cont)

Inline, or something like it, is probably going to be the standard for extending perl

XS, when you have to resort to it, will be far less nasty than it is now

Homegrown Opcodes and Vtables

This is part of the grubby inside of perl 6 You can use any of the internal routines of perl If you do, though, you may run into backward-

compatibility issues at some point. (If it's not part of the embedding, utility, or extension API, we make no promises)

There's no guarantee that calling conventions won't change.

No guarantees that perl 6.4 will even use vtables or opcodes

Utility library

Perl 6 will provide a set of utility routines to handle common tasks String manipulation Encoding changes (Shift-JIS to Unicode, EBCDIC to

ASCII) Conversion routines (string to int or float) Extended precision math (int and float)

These will be stable, like the rest of the API

Variations on a Theme

“Tocatta and Fuge in perl minor by Wall”

The source doesn't have to be perl

The parser isn't obligated to be parsing perl Input source could be Python, Ruby, Java, or

INTERCAL The full perl parser is optional

The interpreter doesn't have to interpret

The interpreter is the destination for bytecode, but it doesn't have to interpret it

It might save directly to disk It might translate the bytecode into an alternate

form—Java bytecode, .NET code, or executable code, for example

The interpreter might translate to machine code on the fly, as a sort of JIT compiler. (Well, really a TIL, but...)

Documents

Perl 6 Internals Dan Sugalski TPC 5.0 “Here there be dragons”