Upload
shitalp
View
213
Download
0
Embed Size (px)
Citation preview
8/12/2019 Amp Dec 2013
1/8
Q1. Instruction format of SPARC
b) Protection mechanismAddressing MemoryWith the Intel Architecture (regardless of mode) all memory references are offsets from a base address. The base address,determined by the contents of a segment register, is called a selector.The segment registers CS, SS, DS, ES, FS and GS are each used to reference a particular kind of segment characterized as code,data, or stack. The CS register indicates which segment is the currently executing code. The instruction pointer (EIP) is the
offset from the beginning of the code segment of where the next instruction fetched occurs. Intersegment control transfers(calls, jumps, returns, interrupts and exceptions) modify the contents of the CS register. All stack operations use the SS registerto locate the stack segment. The DS, ES, FS, and GS registers are used to access up to four separate data areas (FS and GS weremade available on the 80386 processor generation).
For real mode or an 8086/80186 processor the selector value (loaded into a segment register) represents the upper 16-bits ofthe 20-bit linear address. For protected mode the selector is an index into a descriptor table. The referenced descriptor, pointedto by the selector, holds the full 32-bit base portion of the memory address, as well as other information (Figure 1). Until thesegment register is loaded with a new value, all future memory references using that segment add to the base a 32-bit offset todetermine the linear address. If the page unit is enabled, the 32-bit linear address is translated into a 32-bit physical address. Ifthe page unit is not enabled, the 32-bit linear address is the physical address. Segments in protected mode are not limited toonly 64KB as in the 8086 processor, but can be up to 4GB in size.
DescriptorsThere are two classes of descriptors: system and segment. For addressing memory we need only to concern ourselves withsegment descriptors. Discussion of the system descriptors like gates, tasks and LDTs are handled where appropriate. Alldescriptors are 8-bytes each and reside in one of the different descriptor tables (see Descriptor Tables section). The segment
8/12/2019 Amp Dec 2013
2/8
descriptor describes the attributes of each segment. That is where the segment begins in memory or its base address, the size ofthe segment, the type of segment, and the access rights (Figure 2 & Table 1).
Q2. b) cache organizationInstruction & Data Cache of Pentium
Both caches are organized as 2-way set associative caches with 128 sets(total 256 entries)
There are 32 bytes in a line (8K/256) An LRU algorithm is used to select victims in each cache.
Structure of 8KB instruction and data cache Each entry in a set has its own tag. Tags in the data cache are triple ported, used for
U pipeline V pipeline Bus snooping
Data Cache of Pentium Bus Snooping: It is used to maintain consistent data in a multiprocessor
system where each processor has a separate cache Each entry in data cache can be configured for writethrough or write- back
Instruction Cache of Pentium Instruction cache is write protected to prevent self-modifying code. Tags in instruction cache are also triple ported
Two ports for split-line accesses Third port for bus snooping
Split-line Access In Pentium (since CISC), instructions are of variable length(1-15bytes)
8/12/2019 Amp Dec 2013
3/8
Multibyte instructions may staddle two sequential lines stored in code cache Then it has to go for two sequential access which degrades performance. Solution: Split line Access It permits upper half of one line and lower half of next to be fetched from code cache in one clock cycle. When split-line is read, the information is not correctly aligned. The bytes need to be rotated so that prefetch queue receives instruction in proper order.
Translation Lookaside Buffers They translate virtual addresses to physical addresses Data Cache:
Data cache contains two TLBs First:
4-way set associative with 64 entries Translates addresses for 4KB pages of main memory The lower 12 bits addresses are same The upper 20-bits of virtual address are checked against four tags and translated into upper 20-bit physical
address during a hit Since translation need to be quick, TLB is kept small
Second: 4 way set-associative with 8 entries Used to handle 4MB pages
Both TLBs are parity protected and dual ported. Instruction Cache:
Uses a single 4-way set associative TLB with 32 entries Both 4KB and 4MB are supported (4MB in 4KB chunks)
Parity bits are used on tags and data to maintain data integrity Entries are placed in all 3 TLBs through the use of a 3-bit LRU counter stored in each set.
Q3.a) Branch prediction logic Other than the Superscalar ability of the Pentium processor, the branch prediction mechanism is a much-debated
improvement Predicting the behaviors of branches can have a very strong impact on the performance of a machine. Since a wrong
prediction would result in a flush of the pipes and wasted cycles. The branch prediction mechanism is done through a branch target buffer. The branch target buffer contains the
information about all branches. The prediction of whether a jump will occur or no, is based on the branchs previous behavior. There are four possible
states that depict a branchs disposition to jump: Stage0: Very unlikely a jump will occur
Stage1: Unlikely a jump will occurStage2: Likely a jump will occurStage 3: Very likely a jump will occur
When a branch has its address in the branch target buffer, its behavior is tracked. This diagram portrays the
four stages associated branchprediction.
If a branch doesnt jumptwo times in a row, it will go down toState 0.
Once in Stage 0, thealgorithm wont predict another jump unless the branch will jump for two consecutive jumps (so it will go from State 0to State 2)
Once in Stage 3, the algorithm wont predict another no jump unless t he branch is not taken for two consecutive times. It is actually believed that Pentiums algorithm for branch prediction is incorrect. As it can be seen in the diagram to the right, State 0 will jump directly to State 3, instead of following the usual path
which would include State 1, and State 2. This abnormality might be attributed to the way in which the branch target buffer operates: If a branch is not found in the branch target buffer, then it predicted that it wont jump.
- A branch wont get a n actual entry in the branch target buffer, until the first time it jumps, and when it does, it goesstraight into State3.- Because the branch wont get an entry into the branch target buffer until the first time it jumps, this will cause analteration into the actual state diagram, as it can be clearly seen.
Q3 b)Stages of integer pipeline
8/12/2019 Amp Dec 2013
4/8
The Pentium pipelined Integer Unit supports 5 stages:1) Pre-fetch2) Decode3) Address generate4) EX Execute - ALU and Cache Access5) WB Writeback1) In the Pre-fetch cycle, two pre-fetch buffers read instructions to be executed. Instructions can be fetched from the U or Vpipeline. The U pipeline contains more complex instructions.2) In the Decode cycle, two decoders, decode the instructions and try to pair them together so they can run in parallel, since thePentium features a Superscalar architecture.Even though the Pentium processor features a Superscalar architecture, in order for two instructions to run concurrently, like inthe diagram below, they need to satisfy some rules. Essentially, the instructions have to be independent otherwise they cannot be
paired together. 3) In the second Decode stage, or the address generate stage, the addresses of memory operands are calculated. After thesecalculations, the EX stage of the pipeline is ready to execute.
A Floating Point instruction cannot be paired with an Integer instruction.
Stages of floating point pipeline1. Prefetch2. Instruction Decode 1 (D1)3. Instruction Decode 2 (D2)4. EX (execute stage)5. FP E1(floating point execution 1)6. FP E2(floating point execution 2)7. Write FP result8. Error reports
4 a)USB
USB provides a serial bus standard for connecting peripherals devices to PC with simplified addition and removal. USB can connect peripherals such as mice, keyboards, game pads and joysticks, scanners, digital cameras, printers,
external storage, networking components, etc The design of USB is standardized by the USB Implementers Forum (USB-IF), an industry standards body incorporating
leading companies from the computer and electronics industries. History of USB USB 1.0 FDR: Released in November 1995, the same year that Apple adopted the IEEE 1394 standard known as
FireWire. USB 1.0: Released in January 1996. USB 1.1: Released in September 1998. USB 2.0: Released in April 2000. The major feature of this standard was the addition of high-speed mode. This is the
current revision. USB 2.0: Revised in December 2002. Added three speed distinction to this standard, allowing all devices to be USB 2.0
compliant even if they were previously considered only 1.1 or 1.0 compliant.Features of USB
Simplifies the connection process and enables instantaneous additionand removals of peripherals
PC acts as host.once plugged PC automatically detects peripherals andconfigures them. Connects peripherals with 4 wire connection. Supports 3 data rates 480 Mbps (USB 2)(highSpeed) 12 Mbps (full
speed) and 1.5 Mbps(low speed) used to connect human interfacedevices such as mouse, keyboard etc.Contd
Many peripherals can be connected to one port using a specialperipheral device called USB hub. Thro which one can go upto 127devices using single port.
Peripherals share available bandwidth thro token based protocol. Conforms plug and play specification. Distributes power to many low power peripherals.Uses single interrupt line.
Replaces serial and parallel port.
Windows 98, 2000 and XP supports USB. Individual devices can run upto 5 meters and with hub they can go upto 30 meters. PC controls the peripherals. Low costusb pin connections
8/12/2019 Amp Dec 2013
5/8
b)The VESA Local Bus The VESA Local Bus, promoted by the Video Electronics Standards Association, was one of the first attempts to
overcome the limitations of ISA. The VL Bus strategy is to attach the video controller,
and possibly other high-bandwidth devices, directlyto the processors local bus, either directly orthrough a buffer.
The direct connection supports only one device, thebuffered approach supports up to three devices.The VL Bus solved the bandwidth problem (in theshort term anyway).
On a 33 MHz, 32-bit processor bus, the VL Bus couldachieve 132 Mbytes/sec. VESA also made anattempt to address the configuration issue bymandating that all VL Bus devices must support automatic configuration. Unfortunately, they didnt bother to define aconfiguration protocol so every device manufacturer invented their own.
VESA also did not specify with any precision the electrical characteristics of VL devices. They were just expected to be compatible with the 486 bus. But the principal drawback of the V L Bus is that its processor -specific. As soon as the Pentium came out, it was no
longer relevant. It acted as a high-speed conduit for memory-mapped I/Oa nd DMA, while the ISA bus handled interrupts and port-
mapped I/O. Its expansion slot that provides faster data flow between the devices controlled by the expansion cards and your
computer's microprocessor. A "local bus" is a physical path on which data flows at almost the speed ofthe microprocessor , increasing total system performance. VESA Local Bus is particularly effective in systems withadvanced video cards and supports 32-bit data flow at 50 MHz . A VESA Local Bus is implemented by adding asupplemental slot and card that aligns with and augments an Industry Standard Architecture expansion card. (ISA is themost common expansion slot in today's computers.)
Q5. a) block diagram of 80386DX
http://en.wikipedia.org/wiki/Memory-mapped_I/Ohttp://en.wikipedia.org/wiki/Direct_memory_accesshttp://en.wikipedia.org/wiki/Port-mapped_I/Ohttp://en.wikipedia.org/wiki/Port-mapped_I/Ohttp://whatis.techtarget.com/definition/slot-or-expansion-slothttp://searchcio-midmarket.techtarget.com/definition/microprocessorhttp://searchnetworking.techtarget.com/definition/MHzhttp://searchwinit.techtarget.com/definition/ISAhttp://searchwinit.techtarget.com/definition/ISAhttp://searchnetworking.techtarget.com/definition/MHzhttp://searchcio-midmarket.techtarget.com/definition/microprocessorhttp://whatis.techtarget.com/definition/slot-or-expansion-slothttp://en.wikipedia.org/wiki/Port-mapped_I/Ohttp://en.wikipedia.org/wiki/Port-mapped_I/Ohttp://en.wikipedia.org/wiki/Direct_memory_accesshttp://en.wikipedia.org/wiki/Memory-mapped_I/O8/12/2019 Amp Dec 2013
6/8
The Internal Architecture of 80386 is divided into 3 sections.
Central processing unit Memory management unit Bus interface unit
Central processing unit is further divided into Execution unit and Instruction unit Execution unit has 8 General purpose and 8 Special purpose registers which are either used for handling data or
calculating offset addresses. The Instruction unit decodes the opcode bytes received from the 16-byte instruction code queue and arranges them in
a 3- instruction decoded instruction queue. After decoding them, pass it to the control section for deriving the necessary control signals. The barrel shifter increases
the speed of all shift and rotate operations. The multiply / divide logic implements the bit-shift-rotate algorithms to complete the operations in minimum time. Even 32- bit multiplications can be executed within one microsecond by the multiply / divide logic. The Memory management unit consists of a Segmentation unit and a Paging unit. Segmentation unit allows the use of two address components, viz. segment and offset for relocability and sharing of
code and data. Segmentation unit allows segments of size 4Gbytes at max. The Paging unit organizes the physical memory in terms of pages of 4kbytes size each. Paging unit works under the control of the segmentation unit, i.e. each segment is further divided into pages. The
virtual memory is also organizes in terms of segments and pages by the memory management unit. The Segmentation unit provides a 4 level protection mechanism for protecting and isolating the system code and datafrom those of the application program.
Paging unit converts linear addresses into physical addresses. The control and attribute PLA checks the privileges at the page level. Each of the pages maintains the paging
information of the task. The limit and attribute PLA checks segment limits and attributes at segment level to avoidinvalid accesses to code and data in the memory segments
The Bus control unit has a prioritizer to resolve the priority of the various bus requests. This controls the access of the bus. The address driver drives the bus enable and address signal A0 - A31. The pipeline
and dynamic bus sizing unit handle the related control signals. The data buffers interface the internal data bus with the system bus.
Q5 b) features of PCI busPCI stands for Peripheral Component Interconnect. One nice feature of the PCI bus is the complete separation of the bus from the CPU. This allows the bus to be used under different hardware platforms without requiring change of the processor design. To make up for this lack of CPU control, there exists the PCI bridge. Used within IBM-based PCs and other workstations High performance synchronous bus Utilizes multiplexed Address and Data buses System clock rate normally either 33 MHz or 66 MHz Specification includes both 32-bit and 64-bit bus width Currently 32-bit clocked at 33 MHz is more common within PCs Maximum Data Transfer Rate (burst mode)
o 133 MB/s for 32-bit bus clocked at 33 MHzo 532 MB/s for 64-bit bus clocked at 66 MHz
Supports Multiple Masters With the PCI bus, the bus lines are shared between data and addresses. Data and addresses are sent alternating over the bus. The PCI bridge can divide the bus between an address and data. Another feature is PCI burst.
o The address is sent, followed by a data block.o Addresses are automatically incremented by bridge and adapter.
Every device on the PCI bus has a definite address. The PCI bus itself has an identifier (there could be several PCI busses). The slot the PCI card is plugged into has an identifier. The device itself has function numbers for each subunit (e.g. PCI SCSI controller). Moving the card changes address. This is why windows discovers new hardware. PCI devices are interesting because, the driver must find the device.
8/12/2019 Amp Dec 2013
7/8
This is a result of different factors including which slot you plug the card into and what interrupt the PCI Bios assignedthe device.
Initializing the device driver requires finding the device.
PCI workstation
Q64 a)Sr. No. Super SPARC Ultra SPARC
1. SuperSPARC is the version of SPARCmicroprocessor which was released in1992 by Sun Microsystems
UltraSPARC is the version of SPARCmicroprocessor released by Sun Microsystems in1995 replacing SuperSPARC-II
2. SuperSPARC microprocessor uses theSPARC V8 ISA
It used V9 ISA of SPARC architecture
3. 3.1 million transistors were contained inSuperSPARC.
It contained 3.8 million transistors.
4. SuperSAPRC microprocessor had a L1cache of 16KB. Its L2 cache had acapacity of 2MB. L3 cache was not presentin SuperSPARC microprocessor
There are two levels of cache as primary andsecondary. Primary cache is 16KB and secondarycache is 512KB to 4MB.
5. Lesser clock speed Higher clock speed
Q7 b) Itanium processor block diagram
Copro-cessor CPU Cache
MainMemory
PCIBridge
Processor/Main Memory System
SCSI hostadapter
Interface toExpansion Bus
LANadapter I/O
Graphicsadapter
Audio MotionVideo
Bus Slot
PCI Bus
Expansin Bus (ISA/EISA)
Bus Slot Bus Slot Bus Slot
8/12/2019 Amp Dec 2013
8/8
Instruction format