Upload
jade-barker
View
216
Download
0
Embed Size (px)
Citation preview
CS 270
CS 270: Computer Organization
Course Overview
Instructor: Professor Stephen P. Carl
CS 270
Quote of the Day
640 Kbytes [of main memory] ought to be enough foranybody.
– Bill Gates, 1981
CS 270
Course Perspective Most Systems Courses are Builder-Centric; this Course is
Programmer-Centric How one can become a more effective programming by knowing
more about the underlying system Enable you to:
Write programs that are more reliable and efficient Incorporate features that require hooks into OS
– E.g., concurrency, signal handlers (in CS 428) Not just a course for dedicated hackers
(But - might just bring out the hidden hacker in you) You won’t see most of this material in any other course
CS 270
Course Components Lectures
Higher level concepts
Labs and Assignments The heart of the course 1 - 3 weeks Provide in-depth understanding of an aspect of systems Programming and problem solving
Exams (2 + final) Test your understanding of concepts & mathematical principles
CS 270
Timeliness Grace days
4 for the course Covers scheduling crunch, out-of-town trips, illnesses, minor setbacks Save them until late in the term!
Lateness penalties Once grace days used up, get penalized 10% per day Typically shut off all handins 4 days after due date
Catastrophic events Major illness, death in family, … Work with your professor on plan for getting back on track
Advice Once you start running late, it’s really hard to catch up
CS 270
Cheating What is cheating?
Sharing code: either by copying, retyping, looking at, or supplying a copy of a file.
Coaching: helping your friend to write a lab, line by line. Copying code from previous course or from elsewhere on WWW
Only allowed to use code we supply, or from CS:APP website What is NOT cheating?
Explaining how to use systems or tools. Helping others with high-level design issues.
Penalty for cheating: Case remanded to Honor Council.
Detection of cheating: We do check and our tools for doing this are much better than you
might think
CS 270
Policies: Attendance and Grading Presence in lectures: strongly advised
(cut warning after 2 unexcused absences)
Presence in design/help sessions: voluntary
Exams: weighted 15, 15, 15 (final)
Labs: completed/not completed
Guaranteed: > 90%: A > 80%: B > 70%: C
CS 270
Introduction: Themes and Concepts
A bird’s eye view of computer system organization
Processors and technological progress
Abstraction vs. Reality Internal data representations - it’s all just numbers Why knowing Assembly language is a Good Thing Computer storage is large but not infinite Asymptotic complexity is only part of the story Computers compute, but they also communicate
CS 270
5 main components of any Computing System
Sun Blade Computer
Processor
Computer
Control(“brain”)
Datapath(“brawn”)
Memory
(where programs, data live whenrunning)
Devices
Input
Output
Keyboard, Mouse
Display, Printer
Disk (where programs, data live whennot running)
CS 270
5 main components of Computer Systems
Datapath Performs operations on signals moving through the CPU
Control Circuitry Routes signals into, through, and out of the CPU
Memory Volatile (RAM) Permanent Storage (hard drives, DVD-ROM, etc.)
Input devices Mouse, keyboard, etc.
Output devices Monitor, printers, etc.
CS 270
CS 270
March of Progress: Moore’s Law
Moore’s Law: the number of transistors on a single integrated circuit, also called chip density, doubles every 18 months
Applies to any kind of semiconductor chip: memory (RAM), microprocessors, GPUs, etc.
Trend first described by Intel co-founder Gordon Moore in 1965.
CS 270
The March of Progress: DRAM capacity
year size (Mbit)
1980 0.0625
1983 0.25
1986 1
1989 4
1992 16
1996 64
1998 128
2000 256
2002 512
• Now 1.4X/yr, or 2X every 2 years.• 8000X since 1980!
Size in bits of single-chip Dynamic Random Access Memory (DRAM )
CS 270
The March of Progress: Microprocessor Complexity
Alpha 21264: 15 millionPowerPC 620: 6.9 millionPentium Pro: 5.5 million(1995)Sparc Ultra: 5.2 million
Athlon (K7): 22 Million
Itanium 2: 41 Million
Intel 8088: < 50,000 (1979)
Intel 80486: 1 million (1989)
CS 270
The March of Progress: Processor Performance
0100200300400500600700800900
87 88 89 90 91 92 93 94 95 96 97
DEC Alpha 21264/600
DEC Alpha 5/500
DEC Alpha 5/300
DEC Alpha 4/266
IBM POWER 100
1.54X/yr
Intel P4 2000 MHz(Fall 2001)
year
Perfo
rman
ce measu
re
CS 270
The March of Progress: Dramatic Changes
° Memory• DRAM capacity: 64x size improvement in last decade.
° Processor• Speed: 100X performance in last decade.
° Disk• Capacity doubled every year since 1997: 250X in last decade.
CS 270
The March of Progress: Dramatic Changes
What will the state-of-the-art PC be when you graduate?
• Processor clock speed: 5000 MegaHertz (5.0 GigaHertz)• Memory capacity: 4000 MegaBytes (4.0 GigaBytes)• Disk capacity: 2000 GigaBytes (2.0 TeraBytes)
• Time to learn some new units! • Mega => Giga• Giga => Tera• Tera => Peta• Peta => Exa• Exa => Zetta• Zetta => Yotta = 1024
CS 270
The Death of Progress? The Power Wall
CS 270
Multicore Processors
CS 270
CS 270
Computing Systems as Abstractions
Most CS courses emphasize abstraction Abstract data types (CS 157/257) Asymptotic analysis (CS 320)
Computer Systems are organized according to layers of abstraction. In general, abstraction helps engineers of all sorts to
manage complexity Abstraction helps insulate programmers from differences between
various hardware platforms
CS 270
Our Theme: Abstraction Is Important But Don’t Forget Reality
Abstractions have limits Especially in the presence of bugs Understanding details of the underlying implementations: sometimes,
you just need to
Useful outcomes Become a more effective programmer!
Find and eliminate bugs efficiently Understand and tune for program performance
Preparation for later “systems” classes in CS Operating Systems, Networking
CS 270
Reality #1: Ints are not Integers, Floats are not Reals
Example 1: Is x2 ≥ 0? Floats: Yes! Ints:
40000 * 40000 --> 1600000000 50000 * 50000 --> ??
Example 2: Is (x + y) + z = x + (y + z)? Unsigned & Signed Ints: Yes! Floats:
(1e20 + -1e20) + 3.14 --> 3.14 1e20 + (-1e20 + 3.14) --> ??
CS 270
Code Security Example
Similar to code found in FreeBSD’s implementation of getpeername
There are legions of smart people trying to find vulnerabilities in programs
/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
CS 270
Typical Usage/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
#define MSIZE 528
void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, MSIZE); printf(“%s\n”, mybuf);}
CS 270
Malicious Usage/* Kernel memory region holding user-accessible data */#define KSIZE 1024char kbuf[KSIZE];
/* Copy at most maxlen bytes from kernel region to user buffer */int copy_from_kernel(void *user_dest, int maxlen) { /* Byte count len is minimum of buffer size and maxlen */ int len = KSIZE < maxlen ? KSIZE : maxlen; memcpy(user_dest, kbuf, len); return len;}
#define MSIZE 528
void getstuff() { char mybuf[MSIZE]; copy_from_kernel(mybuf, -MSIZE); . . .}
CS 270
Computer Arithmetic Arithmetic operations have important mathematical properties
Cannot assume all the usual mathematical properties, due to finiteness of representations Integer operations satisfy “ring” properties
Commutativity, associativity, distributivity Floating point operations satisfy “ordering” properties
Monotonicity, values of signs
Observation Need to understand which abstractions apply in which contexts Important issues for compiler writers and serious application programmers
CS 270
Great (Grim?) Reality #2: You Need to Know Assembly Chances are, you’ll never write program in assembly
Compilers are much better & more patient than you are
But understanding assembly is key to understanding the machine-level execution model. When bugs happen, high-level language model breaks down Tuning program performance
What optimizations are done/not done by the compiler? What are the sources of program inefficiency?
Implementing system software Compiler has machine code as target Operating systems must manage process state
Creating / fighting malware x86 assembly is the language of choice!
CS 270
Assembly Code Example Time Stamp Counter
Special 64-bit register in Intel-compatible machines Incremented every clock cycle Read with rdtsc instruction
Application Measure time (in clock cycles) required by a procedure:
double t;start_counter();P(); // function to be timedt = get_counter();printf("P required %f clock cycles\n", t);
CS 270
Code to Read Counter Add a small amount of assembly code using GCC’s asm facility This inserts assembly code into machine code generated by compiler:
static unsigned cyc_hi = 0;static unsigned cyc_lo = 0;
/* Set *hi and *lo to the high and low order bits of the cycle counter. */
void access_counter(unsigned *hi, unsigned *lo){ asm("rdtsc; movl %%edx,%0; movl %%eax,%1"
: "=r" (*hi), "=r" (*lo) :: "%edx", "%eax");
}
CS 270
Grim Reality #3: “Random Access Memory” is a non-physical abstraction Memory is not unbounded
It must be allocated and managed Many applications are memory dominated
Bugs due to memory referencing errors are especially pernicious Effects are distant in both time (error may show up long after erroneous
instruction executes) and space (effect may be outside the bounds of the executing program)
Memory performance is not uniform Cache and virtual memory effects can greatly affect program performance Adapting program to characteristics of memory system can lead to major
speed improvements
CS 270
Example of a Memory Referencing Bug
double fun(int i){ volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0];}
fun(0) –> 3.14fun(1) –> 3.14fun(2) –> 3.1399998664856fun(3) –> 2.00000061035156fun(4) –> 3.14, then segmentation fault
CS 270
Memory Referencing Bug Exampledouble fun(int i){ volatile double d[1] = {3.14}; volatile long int a[2]; a[i] = 1073741824; /* Possibly out of bounds */ return d[0];}
fun(0) –> 3.14fun(1) –> 3.14fun(2) –> 3.1399998664856fun(3) –> 2.00000061035156fun(4) –> 3.14, then segmentation fault
Saved State
MSB of d[0]
LSB of d[0]
a[1]
a[0] 0
1
2
3
4
Location accessed by fun(i)
Explanation:
CS 270
Memory Referencing Errors Unlike Java, C and C++ do not protect memory at all
Out of bounds array references Invalid pointer values Abuses of malloc/free (e.g., memory leaks)
This usually leads to nasty bugs Whether or not bug has any effect depends on system and compiler Action at a distance
Corrupted object logically unrelated to one being accessed Effect of bug may be first observed long after it is generated
How can I deal with this? Program in Scheme, Java or Python Understand what possible interactions may occur Use tools that detect referencing errors
CS 270
Memory System Performance Example
Hierarchical memory organization Performance depends on access patterns
Including how we step through multi-dimensional arrays
void copyji(int src[2048][2048], int dst[2048][2048]){ int i,j; // access in column-major order for (j = 0; j < 2048; j++) for (i = 0; i < 2048; i++) dst[i][j] = src[i][j];}
void copyij(int src[2048][2048], int dst[2048][2048]){ int i,j; // access in row-major order for (i = 0; i < 2048; i++) for (j = 0; j < 2048; j++) dst[i][j] = src[i][j];}
21 times sloweron Pentium 4
CS 270
The Memory Mountain
s1
s3
s5
s7
s9
s11
s13
s15
8m
2m 512k 12
8k 32k 8k
2k
0
200
400
600
800
1000
1200
Read throughput (MB/s)
Stride (words) Working set size (bytes)
Pentium III Xeon550 MHz
16 KB on-chip L1 d-cache16 KB on-chip L1 i-cache
512 KB off-chip unifiedL2 cache
L1
L2
Mem
CS 270
Reality #4: There’s more to system performance than asymptotic complexity
Constant factors matter too!
And even exact op count does not predict performance Easily see 10:1 performance range depending on how code was written Must optimize at multiple levels: algorithm, data representations,
procedures, and loops
Must understand system to optimize performance How are programs compiled and executed How do we measure program performance and identify bottlenecks How do we improve performance without destroying code modularity and
generality
CS 270
Example: Matrix Multiplication
Standard desktop computer, vendor compiler, using optimization flags Both implementations have exactly the same operations count (2n3) What is going on?
0
5
10
15
20
25
30
35
40
45
50
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
matrix size
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHz (double precision)Gflop/s
160x
Triple loop
Best code (K. Goto)
CS 270
MMM Plot: Analysis
0
5
10
15
20
25
30
35
40
45
50
0 1,000 2,000 3,000 4,000 5,000 6,000 7,000 8,000 9,000
matrix size
Matrix-Matrix Multiplication (MMM) on 2 x Core 2 Duo 3 GHzGflop/s
Memory hierarchy and other optimizations: 20x
Vector instructions: 4x
Multiple threads: 4x
Each speedup due to taking advantage of increasingly complex system resources
Effect: less register spills, less L1/L2 cache misses, less TLB misses
CS 270
Reality #5: Computers do more than just execute programs
They need to get data in and out I/O system is critical to program reliability and performance
They communicate with each other over networks Many system-level issues arise in presence of network
Concurrent operations by autonomous processes Coping with unreliable media Cross platform compatibility Complex performance issues
CS 270
Slide Acknowledgements
Slides to accompany textbook due to Bryant and O’Hallaron
Technology Trends slides due to Dr. XXX Carle of UC/Berkeley
Some graphs from Hennesey and Patterson: Computer Organization and Design, 4th Ed.