Upload
rosalind-robertson
View
214
Download
0
Embed Size (px)
Citation preview
Perl 6 Internals
Dan SugalskiTPC 5.0
“Here there be dragons”
The big goals of perl 6's internals
Speed Extendibility Cleanliness Compatibility Modularity Thread Safety Flexibility
Some global decisions
The core will be in C. (Like it or not, it's appropriate for code at this level)
The core must be modular, so pieces can be swapped out without rebuilding
It must be fast Long-term binary compatibility is a must Your average perl coder or extension writer
shouldn't need any info about the guts Things should generally be thought out,
documented, and engineered
The quick overview
Parser Compiler Optimizer Runtime engine
Parser Compiler Optimizer Interpreter
SyntaxTree Unoptimized
Bytecode
OptimizedBytecode
Fully-ladenInterpreter
PrecompiledBytecode
The parser
Where the whole thing starts Generally takes source of some sort and turns it
into a syntax tree
The Bytecode Compiler
Turns a syntax tree into bytecode Performs some simple optimization
The optimizer
Takes the plain bytecode from the compiler and abuses it heavily
An optional step, generally skipped for compile-and-go execution
Should be able to work on small parts of a program for JIT optimization
The Interpreter
Takes compiled (and possibly optimized) bytecode and does something with it
Generally that something is execute, but it might also be: Save to disk Translate to another format (.NET, Java bytecode) Compile to machine code
The Parser
“Double, double, toil and troubleFire burn, and cauldron bubble”
Parser goals
Extendible in perl More powerful than what we have now Retargetable Self-contained and removable
Parsing perl isn't easy
May well be one of the toughest languages to properly parse
If we get perl right other languages are easy. Or at least easier
We have the full power of perl to draw on to do the parsing (Including the regex engine and Damian's Bizarre Idea de Jour)
The parser will be in C
We will be using C for the parser A full set of callbacks will be available to hook
into the parser in lots of places Adding new parsing rules (probably with regexes
describing them) will be easy The parser will be extendable via perl code
The Compiler
“Mmmmm, tasty!”
From syntax tree to bytecode
The compiler takes a syntax tree and turns it into bytecode
Very little optimization is done here. Optimization is expensive and optional Pretty straightforward—this isn't rocket science
The Optimizer
“We can rebuild it.Make it better, faster, stronger”
The Optimizer
Takes plain bytecode and makes it faster Does all the sorts of things that you expect an
optimizer to do—code motion, loop unrolling, common subexpression work, etc.
Will be an iterative process This will be interesting, as perl's a pain to
optimize An optional step, of course
Things that make optimizing perl tough
Active data Runtime redefinitions of everything Really, really late binding (Waiting for Godot
late) Perl programmers are used to more predictable
runtime characteristics than, say, C programmers.
The Interpreter
“Polly want a cracker?”
Interpreter goals
Fast Tuned for perl Language neutral where possible Event capable Sandboxable Asynchronous I/O built in Built with an eye towards TIL and/or native code
compilation Better debugging support than perl 5
The perl 6 interpreter is software CPU
Complete with registers and an assembly language
This can make translating perl 6 bytecode into native machine code easier
There's a lot of literature on building optimzing compilers that can be leveraged
While more complex than a pure stack-based machine, it's also faster
Opcode dispatch needs to be faster than perl 5 Opcode functions can be written in perl
CPU specs
64 int, float, string, and PMC registers A segmented multiple stack architecture Interrupt-capable (for events) Pretty much completely position independent—
everything is referenced via register, pad entry, or name
The regex engine
The regex engine is going to be part of the perl 6 CPU, not separate as it is now
A good incentive to get opcode dispatch fast Makes expanding the regex engine a bit easier Details will be hidden as a set of regex opcodes
A few words on the stack system
Each register file has an associated stack All registers of a particular type can be pushed
onto or popped off the stack in one go Individual registers or groups of registers can be
pushed or popped The stacks are all segmented so we're not relying
on finding contiguous chunks of memory for them
There's also a set of call and scratch stacks
Bytecode
“Could you say that a little differently?”
What is bytecode?
A distilled version of a program Machine language for the PVM Can contain a lot of 'extra' information, including
full source Designed to be platform independent Should be mostly mappable as shared data
(modulo the fixup sections)
Data Structures
“Vtables and strings and floats, oh my!”
Variables
Vtable Pointer
Data Pointer
Integer Value
Float Value
Flags
Synchronization
GC Data
Generically called a PMC
Bigger than Perl 5's base data structure
Synchronization data built-in
Same for all variable types
GC data is not part of base structure
Scalars
Built off the base PMC structure Use the integer and float areas as caches Data pointer points off to string, large int, or large
float Vtable functions determine how it all works
Arrays
Built off the base PMC structure Data pointer points to array data All perl 6 arrays are typed May have an array of scalars, strings, integers, or
floats Array only takes up enough memory to hold their
types
Hashes
Built off the base PMC structure Data pointer points to array data All perl 6 hashes are typed May have a hash of scalars, strings, integers, or
floats Hashes only takes up enough memory to hold
their types Hashing function is overridable
Strings
Encoding
Type
Buffer Start
Buffer Length
String Length
String Size
Strings are sort of abstract
Perl 6 can mix and match string data (Unicode, ASCII, EBCDIC, etc)
New string types can be loaded on the fly
Flags
Unused
String handling
Perl 6 has no 'built-in' string support—all string support is via loadable libraries
There'll be Unicode, ASCII, and EBCDIC support provided (at least) to start
Numbers
Bigints and bigfloats share the same header
Arbitrary-length floating point and integer numbers are supported
Perl automagically upgrades ints and floats when needed
Buffer Pointer
Length
Exponent
Flags
Vtables
All variable data access is done through a table of functions that the variable carries around with it
This allows us faster access, since code paths are specialized for just the functions they need to perform
Isolates us from the implementation of variables internally
Allows special purpose behaviour (like perl 5's magic) to be attached without cost to the rest of perl
Vtables (cont'd)
Makes thread safety easier A little bit more overhead because of the extra
level of indirection, but the smaller functions make up for that
Vtable functions can be written in perl. (Each class with objects blessed into it will have at least one)
There may be more than one vtable per package
Vtables hide data manipulation
Pretty much all the code to handle data manipulation will be done via variable vtables
Ths allows the variable implementation to change without perl needing to know
Allows far more flexibility in what you can make a variable do
Shortens the code path for data functions and trims out extraneous conditionals
For example:Fetching the string value of a scalar
For scalars with strings:
String *get_str(PMC *my_PMC) { return my_PMC->data_pointer;}
For int-only scalar:
String *get_str(PMC *my_PMC) { my_PMC->data_pointer = make_string(my_PMC->integer); my_PMC->vtable =
int_and_string_vtable; return my_PMC->data_pointer;}
Memory Management
“Now where did I put that?”
Getting headers
All the fixed-size things (PMCs, string/number headers) get allocated from arenas
All headers, with the exception of PMCs (maybe) are moveable by the garbage collector
Non-PMC header allocation is very fast PMC allocation is only mostly fast
Buffer Management
Anything that isn't a fixed size gets allocated from the buffer pools
All buffered data, with the exception of data allocated in special pools, is moveable by the garbage collector
Because of GC, allocation is very quick
Garbage Collection
“Bring out yer dead!”
The perl 6 GC is a copying collector
Everything except PMCs is moveable in Perl 6 PMCs might be moveable too We get a compact memory heap out of this,
which allows for fast allocation Perl 6 will release empty memory back to the
system when it can Refcounts are used only to note object lifetimes,
not for GC Refcounts, for the most part, are dead
GC considerations for Objects
Garbage collection and object death are now separate things
Perl's guarantee of timely object death is stronger We still don't guarantee perfect collection (but it
sucks less) We still refcount for real perl references, but only
2 bits are used Objects with more than two simultaneous
references won't get collected until a full dead variable scan is made
Extensions beware!
Since we have no refcounts, extensions must tell perl when they hold on to PMCs
Not a huge deal, as we piggy-back on the cross-interpreter PMC tracking we use for threads
No more struct PMC; in extensions...
Extending Perl 6
Extensions Made Easier
Perl 6 will have a real API The API is multilevel
Simple for embedders More complex for extension authors Pretty messy for vtable or opcode writers
Binary compatibility is a very strong consideration
Embedding
Guaranteed stable and binary compatible for the life of perl 6
Very simple API Create interpreter Destroy interpreter Parse source Run code Register native functions
Extensions
Much simpler interface to perl's internals The gory details are hidden Stable binary compatibility is a very strong goal
We may add functions or options, but we won't take them away
Extensions built for perl 6.0.1 should still run with perl 6.8.12 without rebuilding
Manipulating perl data should be much easier If you have to resort to Inline to wrap a library
then it means we've not got it right
Extensions (cont)
Inline, or something like it, is probably going to be the standard for extending perl
XS, when you have to resort to it, will be far less nasty than it is now
Homegrown Opcodes and Vtables
This is part of the grubby inside of perl 6 You can use any of the internal routines of perl If you do, though, you may run into backward-
compatibility issues at some point. (If it's not part of the embedding, utility, or extension API, we make no promises)
There's no guarantee that calling conventions won't change.
No guarantees that perl 6.4 will even use vtables or opcodes
Utility library
Perl 6 will provide a set of utility routines to handle common tasks String manipulation Encoding changes (Shift-JIS to Unicode, EBCDIC to
ASCII) Conversion routines (string to int or float) Extended precision math (int and float)
These will be stable, like the rest of the API
Variations on a Theme
“Tocatta and Fuge in perl minor by Wall”
The source doesn't have to be perl
The parser isn't obligated to be parsing perl Input source could be Python, Ruby, Java, or
INTERCAL The full perl parser is optional
The interpreter doesn't have to interpret
The interpreter is the destination for bytecode, but it doesn't have to interpret it
It might save directly to disk It might translate the bytecode into an alternate
form—Java bytecode, .NET code, or executable code, for example
The interpreter might translate to machine code on the fly, as a sort of JIT compiler. (Well, really a TIL, but...)