Java Virtual Machine - Kirkwood Community virtual machine Source code (java or other HLL) compiler/linker java compiler class file ... –Compiled Java program makes similar call, ...

  • Published on
    17-Mar-2018

  • View
    216

  • Download
    3

Transcript

Java Virtual Machine part 1 One more architecture: the Java Virtual Machine JVM is a virtual machine that executes Java Byte Code (JBC) start with Java source code (.java text file) Java compiler (javac) creates JBC file with .class extension standardized binary format single program consists of one or more class files multiple class files may be packaged together as a .jar file for program distribution JVM execution environment JVM is represented by an executable program called java emulates JVM instruction set interpreter JBC is stack based, so JVM uses stack architecture Why a virtual machine? Idea didnt originate with Java idea comes from time-sharing systems (circa 1960s) VM advantages: platform independence transcends physical limits ease of updates security safeguards Platform independence Java compiler is platform-independent: makes no assumptions about characteristics of underlying hardware JVM required to run Java byte code Works as a wrapper around a real machines architecture so the JVM itself is extremely platform dependent Java environment vs. traditional HLL environment hardware platform machine language java virtual machine Source code (java or other HLL) compiler/linker java compiler class file (JBC) How it works Java compiler translates source code into JBC JVM acts as interpreter - translates specific byte codes into machine instructions specific to the harbor platform its running on Acts like giant switch/case structure: each bytecode instruction triggers jump to a specific block of code that implements the instruction in the architectures native machine language JVMs superpower: transcends physical limits No hardware costs (both $ and resource tradeoffs) Because of multithreading, can have (seemingly) unlimited processor power No backward compatibility issues Can be adapted to optimize hardware resources of specific platform Designed from scratch in mid-90s: several generations of engineering experience led to design superior to most physical chips JVM and security issues A virtual machine can be (and JVM is) configured to run in secure environment VM can intervene if a program tries to do something it shouldnt can enforce stricter security policies than those of OS JVM bytecode is verifiable most security flaws happen by accident byte code is checked by both compiler and JVM result: improved software quality & reliability Downside of virtual machines Before Java, virtual machines relatively uncommon because of performance issues Takes about 1000 times longer to do an operation in software instead of hardware hardware advances & compiler improvements mitigate this in practical terms, speed difference is anywhere from 2x slower to more than about 6% slower VM doesnt provide direct control over hardware available with native low-level language Characteristics of JVM Not just a virtual machine; something like a virtual operating system Case in point: output statement printf() Compiled C program calls the operating systems write() function Compiled Java program makes similar call, but to a JVM routine which then calls the real write() function Java & threading Because Java is a virtual machine, it is free of some of the constraints of a real machine architecture Java threads exemplify this separate processes running in parallel simulates a multi-processor environment independent of actual platform Characteristics of JVM Stack-based language & machine Each thread within a program has its own stack 32-bit word size Relatively small instruction set (about 200 instructions) Characteristics of JVM: registers 4 registers (sort of): program counter (PC) optop: points to top of operand stack for currently-active method frame: points to stack frame for current method vars: points to start of local variables for current method Each program (or thread) has these, as well as its own stack Characteristics of JVM: registers No general-purpose registers means more memory fetches, detrimental to performance tradeoff is high degree of portability Most instructions access stack Characteristics of JVM: stack memory Each method call produces its own stack frame, which is pushed on the threads stack; a return instruction pops the stack Stack frame includes: local variables section operand stack section Local variables section of stack frame Consists of set of word-size slots, each of which holds a single variable; includes parameters & locally-declared data, in order of declaration; if method is non-static, first slot (slot 0) contains pointer to this Operand stack section of stack frame Operand stack section is where methods instructions operate - the stack referred to when talking about instructions operating on the stack Maximum depth of operand stack is determined at compile time Current stack depth is determined by number & type of operands on stack: double and long values take up two slots all other data types take one slot JVM Method area Stores classes used by executing program; includes: bytecode & access types of methods values & access types of static variables PC points to this area location of next instruction Method area also includes constant pool storage for literal values used in program JVM Heap Memory allocated for objects from this area Holds objects instance values and pointer to objects class in the Method area JVM instruction set Instructions consist of one-byte opcode followed by 0 or more operands Instruction types include: load/store of local variables & object fields array arithmetic and logical type conversion control method call/return Java Byte Code As assembly language code is to most HLLs, JBC is to Java Although most programmers work at the high level, a thorough understanding of the lower level helps us achieve better performing, lower cost software JVM instructions* A JVM instruction consists of a one-byte opcode specifying the operation to be performed, followed by zero or more operands supplying arguments or data that are used by the operation Many instructions have no operands and consist only of an opcode * This and the next several slides are almost verbatim from the official reference on all things JVM; the quoted parts are in purple: http://java.sun.com/docs/books/jvms/second_edition/html/ http://java.sun.com/docs/books/jvms/second_edition/html/The JVM Loop Ignoring exceptions, the inner loop of a Java virtual machine interpreter is effectively do { fetch an opcode; if (operands) fetch operands; execute the action for the opcode; } while (there is more to do); Opcodes and operands The number and size of the operands are determined by the opcode If an operand is more than one byte in size, then it is stored in big-endian order The bytecode instruction stream is only single-byte aligned Not assuming data alignment means that immediate data larger than a byte must be constructed from bytes at run time on many machines JBC Data Types Correspond closely to Java types; conspicuous for its absence is boolean, which in JBC is stored as an int This is because it is no more (and is likely to be less) efficient in most real architectures to access a single bit as opposed to a single (32-bit) word so boolean values are stored as 1 or 0 Other sub-word storage types (byte, short and char) are promoted to word type for arithmetic operations (implicit promotion, to us) but thats in the stack, not in memory Operations on these types are, effectively, int operations JBC data types Data type JBC Code Explanation int i 32-bit signed integer float f 32-bit IEEE 754 floating point number long l 64-bit integer takes 2 stack frames double d 64-bit IEEE 754 floating point number (2 stack frames) byte b 8-bit signed integer short s 16-bit signed integer char c 16-bit unsigned integer or Unicode (UTF-16) character address a Objects JVM stack frames Recall that each thread or program has its own JVM stack to store frames Frames are created when methods are invoked Frame consists of: operand stack local variable table (array) pointer to the runtime constant pool of the current methods class JVM stack frames Size of both operand stack and local variable table are determined at compile time Operand stack stores: operands for opcode instructions operation results return values from methods JVM instructions Data typing in Java requires type-specific instructions; thus for example, the add instruction comes in four different flavors: iadd: adds integers ladd: adds longs fadd: adds floats dadd: adds doubles Similar instructions exist for other arithmetic operations Arithmetic instructions Each arithmetic instruction works as follows: top 2 elements are popped off stack result is computed result is pushed to stack Elements may be one or two words in size: ints and floats: 32 bits, single word doubles and longs: 64 bits, two words each Arithmetic instructions The modulus operation exists only for the integer and long types; the instruction is irem or lrem On the high level, modular division is allowed on float and double types, but the result is always a whole number evidence of implicit type conversion Data typing and arithmetic instructions Mixed-type expressions must have all operands converted to single data type for evaluation Unary conversion operations exist to facilitate this: i2f: converts int to float b2i: converts byte to int etc. Always possible to convert between the 4 basic types (i, f, l and d), and anything can be converted to int Logical & shift operations Operate on integer types only Logical operations include and, or and xor; examples: land: and on 2 longs ixor: xor on 2 ints Shift operations on ints include ishl, ishr and iushr (unsigned shift right); similar operations exist for longs Data access operations Load/store instructions exist for the 4 primitive types; transfer values between local variable table & operand stack iload: push int variable on stack dstore: store stack value in local double variable const versions exist for literal value load/store Can also load/store objects using aload/astore Object creation & manipulation Although both class instances and arrays are objects, the Java virtual machine creates and manipulates class instances and arrays using distinct sets of instructions: Create a new class instance: new. Create a new array: newarray, anewarray, multianewarray. Object creation & manipulation Access static fields and instance variables: getfield, putfield, getstatic, putstatic Load an array component onto the operand stack: iaload, laload, faload, daload, aaload, etc. Store a value from the operand stack as an array component: iastore, etc. Get the length of array: arraylength Stack manipulation & method handling Several instructions operate directly on the operand stack, including pop, pop2, swap, and several others Method invocations are handled by instructions specific to the type of method: invokevirtual: starts an instance method invokestatic: starts a static method invokeinterface: starts a method specified by an interface Various return instructions are used to return values from methods (ireturn, dreturn, etc. also just return for void methods) Java class files Consists of stream of bytes Class file data types describe the various fields in the class file format: u1: unsigned 1-byte number u2: unsigned 2-byte u4: unsigned 4-byte The next several slides describe the class file format in depth ClassFile structure ClassFile { u4 magic; u2 minor_version; u2 major_version; u2 constant_pool_count; cp_info constant_pool[constant_pool_count-1]; u2 access_flags; u2 this_class; u2 super_class; u2 interfaces_count; u2 interfaces[interfaces_count]; u2 fields_count; field_info fields[fields_count]; u2 methods_count; method_info methods[methods_count]; u2 attributes_count; attribute_info attributes[attributes_count]; } Fields in Classfile structure magic_number: used to identify this file as a class; the magic number is the hex value CAFEBABE (no, Im not kidding) minor_version and major_version are class file versions; values must fall within a range of numbers (defined by Sun) in order to be runnable on a particular JVM Fields in Classfile structure constant_pool [] and constant_pool_count: constant_pool is an array of string literals, class, interface and field descriptors that are referenced in the Classfile structure constant_pool_count is the size of the constant_pool access_flags: set of flags indicating access information (public, private, etc.) about the class or interface Fields in Classfile structure this_class: must be valid index to constant_pool; entry at that index is structure describing (very briefly) the current class super_class: 0 if this class is not derived; otherwise, must be valid to constant_pool; entry at index describes the superclass Fields in Classfile structure interfaces[] and interfaces_count: the latter is the number of superinterfaces of the current class; the former is an array whose entries are valid indexes to the constant_pool, where the entries are structures describing all of this classs superinterfaces fields[], fields_count, methods[], methods_count, attributes[] and attributes_count: more of the same Example descriptor field A method descriptor has the following format: method_info { u2 access_flags; u2 name_index; u2 descriptor_index; u2 attributes_count; attribute_info attributes[attributes_count]; } Digging in a little deeper The method_info structure (which is itself an entry in the constant_pool) contains another structure, attribute_info[] As the name suggests, this structure is an array of method attributes Attributes include constant values, code, exceptions and several others; we will confine our discussion to the first two ConstantValue attribute fixed-length structure representing value of a static constant; descriptor is: Both indexes refer to the constant_pool, where they must match legitimate entries; value of attribute_length for ConstantValue is 2 ConstantValue_attribute { u2 attribute_name_index; u4 attribute_length; u2 constantvalue_index; } Code_attribute Contains actual JVM instructions for a method Descriptor: Code_attribute { u2 attribute_name_index; u4 attribute_length; u2 max_stack; u2 max_locals; u4 code_length; u1 code[code_length]; u2 exception_table_length; { u2 start_pc; u2 end_pc; u2 handler_pc; u2 catch_type; } exception_table[exception_table_length]; u2 attributes_count; attribute_info attributes[attributes_count]; } Code_attribute fields max_stack and max_locals give the size of the operand stack and local variable table for the methods frame code_length and code[] are the number of instructions and an array containing the instructions themselves Code_attribute fields exception_table[] is an ordered array of exception handler descriptors; exception_table_length is the array size start_pc and end_pc indicate the indexes of the code[] array that define the range within which exception listeners are active Even imagination has its limits The JVM, because it isnt real, has access to theoretically unlimited resources Of course, it still has to exist in the real world, and both hardware and the JVM spec itself impose constraints The next several slides describe some of these Size matters Consider the following ClassFile attributes: u2 constant_pool_count; u2 fields_count; u2 methods_count; u2 attributes_count; All of these are type u2: unsigned 2-byte integer; the maximum size of a 2-byte integer is 65,535 Thus, each of the arrays described by the various count attributes is limited to ~64K entries Size matters (again) The Code_attribute descriptor includes these fields: u2 max_stack; u2 max_locals; u4 code_length; Thus the operand stack and local variable table for a method are subject to the same 64K limit The code_length attribute implies that the number of instructions could exceed the limit, but the actual code length is limited by the size of the exception_table Last word on sizes 255 is the maximum number of: array dimensions parameters to a single method 65,535 is the maximum length of: identifiers String literals

Recommended

View more >