Chapter 2 : How are data represented ?

Chapter Chapter 22: How are : How are data representeddata represented??

Why we care?Why we care?

The accuracy of our resultsThe accuracy of our results The speed of processingThe speed of processing The range of alphabets available to usThe range of alphabets available to us The size of the files we must storeThe size of the files we must store The quality of graphics on screen and The quality of graphics on screen and

on paperon paper The time it takes for Internet The time it takes for Internet

downloaddownload

Why computers work in Why computers work in binary?binary?

Cheapest and simplest in design and Cheapest and simplest in design and engineeringengineering

Switch: on Switch: on 11; off ; off 0 0 Circuit: voltages Circuit: voltages

– 1.7 volts – higher 1.7 volts – higher 11– 0.0 volts - 1.3 volts 0.0 volts - 1.3 volts 0 0– Voltages (1.3 to 1.7) are avoided in designVoltages (1.3 to 1.7) are avoided in design

Mathematics: binary numbersMathematics: binary numbers– Using digits 0 and 1 only.Using digits 0 and 1 only.

Decimal vs. BinaryDecimal vs. Binary

Decimal # systemDecimal # system– 10 symbols: 1, 2, 3,…9, 010 symbols: 1, 2, 3,…9, 0– Base = 10 (We have 10 fingers)Base = 10 (We have 10 fingers)– Decimal number 2324 reads “Decimal number 2324 reads “2 2

thousands 3 hundreds twenty four”.thousands 3 hundreds twenty four”. Binary # systemBinary # system

– 2 symbols: 0 and 12 symbols: 0 and 1– Base = 2Base = 2– Binary number 1101 = ?Binary number 1101 = ?

Decimal vs. BinaryDecimal vs. Binary42 3 2 .

2*1000

3*100

2*10

4*1Each digit represents: 10

00100

10 1Position values: 103Position values (base):

102 101 100

Decimal # System:

11 1 0 .

1*8 1*4 0*2 1*1Each digit represents:

8 4 2 1Position values:

23Position values (base):

22 21 20

Binary # System:

Value in Decimal:

2*1000+3*100+2*10+4*1 = 2324D

Value in Decimal:

1*8+1*4+0*2+1*1 = 13D

Why do computer work in Why do computer work in binary?binary? Binary digits – bitsBinary digits – bits 8 bits = 1 byte8 bits = 1 byte 2210 10 bytes = 1024 bytes =1 kilobytes = bytes = 1024 bytes =1 kilobytes =

1KB 1KB 2220 20 bytes = 2bytes = 210 10 KB = 1 megabytes = 1MBKB = 1 megabytes = 1MB 2230 30 bytes = 2bytes = 210 10 MB = 1 gigabytes = 1GBMB = 1 gigabytes = 1GB 2240 40 bytes = 2bytes = 210 10 GB = 1 terabytes = 1TBGB = 1 terabytes = 1TB

Types of dataTypes of data

InstructionsInstructions– Computer instructions are coded in sequences of 0’s Computer instructions are coded in sequences of 0’s

and 1’sand 1’s NumbersNumbers

– 2324, -34.35, 34567890123.123452324, -34.35, 34567890123.12345 Characters and symbolsCharacters and symbols

– A, B, C, … Z, a, b, c,… z, A, B, C, … Z, a, b, c,… z, – 0, 1, 2, 3 … 9, +, -, ), (, *, &, etc0, 1, 2, 3 … 9, +, -, ), (, *, &, etc

ImagesImages– Photos, charts, drawingsPhotos, charts, drawings

AudioAudio– Sound, music, etcSound, music, etc

VideoVideo– Video clips and moviesVideo clips and movies

Representation of NumbersRepresentation of Numbers

Fixed-size-storage approach:Fixed-size-storage approach:– Computers allocate a specified amount of Computers allocate a specified amount of

space for a numberspace for a number IntegersIntegers

1 bit: 0 to 11 bit: 0 to 1 2 bits: 00, 01, 10, 11 2 bits: 00, 01, 10, 11 0 to 3 0 to 3 4 bits: 0000, 0001, 0010, … 1111 4 bits: 0000, 0001, 0010, … 1111 0 to 15 0 to 15 1 byte: 0 to 2551 byte: 0 to 255 2 bytes: -32768 to +327672 bytes: -32768 to +32767 4 bytes: -2,147,483,648 to +2,147,483,6474 bytes: -2,147,483,648 to +2,147,483,647Note: with 4 bytes for integers, any number Note: with 4 bytes for integers, any number

smaller than smaller than -2,147,648 -2,147,648 or larger than or larger than 2,147,483,6472,147,483,647 would be incorrectly would be incorrectly represented.,represented.,

Representation of Representation of NumbersNumbers

11 10 .

1*2 0*1 1*0.5

1*0.25

Each digit represents:

2 1 1/2 1/4Position values:

21Position values (base):

20 2-1 2-2

Binary # System:

Value in Decimal:

2 + ½ + ¼ + 1/8 = 2.875D

1

1*0.125

1/82-3

Binary representation of real numbers


Floating-point numbers for real numbersFloating-point numbers for real numbers– Three parts of representation:Three parts of representation:

1.1. Sign (always 1 bits: 0 for + and 1 for -)Sign (always 1 bits: 0 for + and 1 for -)2.2. Significant digits (e.g., six bits)Significant digits (e.g., six bits)3.3. the power of 2 for the leftmost digit (e.g., 3 bits)the power of 2 for the leftmost digit (e.g., 3 bits)

– Example for binary -1111.01Example for binary -1111.01 Sign: 1 (negative)Sign: 1 (negative) Significant digits: 111101Significant digits: 111101BB Power of 2: 011Power of 2: 011BB

– Example for binary +100.1101Example for binary +100.1101BB Sign: 0 (positive)Sign: 0 (positive) Significant digits: 100110Significant digits: 100110BB

– Note: the last digit is lost, which is 1/16 in decimalNote: the last digit is lost, which is 1/16 in decimal Power of 2: 010Power of 2: 010BB


Single-precision floating-point numbersSingle-precision floating-point numbers1.1. Sign (always 1 bits: 0 for + and 1 for -)Sign (always 1 bits: 0 for + and 1 for -)2.2. Significant digits: 23 bitsSignificant digits: 23 bits3.3. exponent: 8exponent: 8

Double-precision floating-point numbersDouble-precision floating-point numbers1.1. Sign (always 1 bits: 0 for + and 1 for -)Sign (always 1 bits: 0 for + and 1 for -)2.2. Significant digits: 52 bitsSignificant digits: 52 bits3.3. exponent: 11exponent: 11

What you should know?What you should know?– Computers can represent numbers only in limited Computers can represent numbers only in limited

accuracy.accuracy. E.g., when you enter a E.g., when you enter a 20 digit20 digit decimal # into a program decimal # into a program

that uses single-precision, only that uses single-precision, only about 7 digitsabout 7 digits are actually are actually stored, the rest are lost.stored, the rest are lost.

– Real examples:Real examples: Designing aircraft on p.35Designing aircraft on p.35 The Vancouver Stock Exchange Index on pp. 38-39 The Vancouver Stock Exchange Index on pp. 38-39

Representation of Representation of NumbersNumbers// file: public_html/2005f-html/cil102/accuracy.c// file: public_html/2005f-html/cil102/accuracy.c#include <stdio.h>#include <stdio.h>

int main() {int main() { int x, y, result;int x, y, result; // x, y, and result all use 32 bits to represent integers (-2,147,648 to // x, y, and result all use 32 bits to represent integers (-2,147,648 to

+2,147,483,647)+2,147,483,647) char op;char op; int i;int i;

for (i = 0; i < 100; i++) {for (i = 0; i < 100; i++) { printf("please enter an expression:\n");printf("please enter an expression:\n"); scanf("%d %c %d", &x, &op, &y);scanf("%d %c %d", &x, &op, &y);

if (op == '+')if (op == '+') result = x + y;result = x + y; else if (op == '-')else if (op == '-') result = x - y;result = x - y; else {else { printf("Invalid operator!!");printf("Invalid operator!!"); break;break; }} printf("%d %c %d = %d\n", x, op, y, result);printf("%d %c %d = %d\n", x, op, y, result); }}}}// When you enter // When you enter 2000000000 + 5000000002000000000 + 500000000, the result is , the result is -1794967296-1794967296


Variable-size-storage approach:Variable-size-storage approach:– Allow a wide-range of numbers to be Allow a wide-range of numbers to be

stored accuratelystored accurately– Needs significant more time to Needs significant more time to

processprocess– Fixed-size approach is used more Fixed-size approach is used more

common than variable-size common than variable-size approach.approach.

Representation of charactersRepresentation of characters

There are no visual letters A, B, C, etc stored in There are no visual letters A, B, C, etc stored in computers like we have in mind.computers like we have in mind.

Letters and symbols are encoded in 8 bits – one Letters and symbols are encoded in 8 bits – one byte - of 0’s and 1’s.byte - of 0’s and 1’s.– Keyboard converts keys A, B, C etc to their Keyboard converts keys A, B, C etc to their

corresponding codes and corresponding codes and – monitor converts the code into visual letters A, B, C monitor converts the code into visual letters A, B, C

etc on screen.etc on screen. Two commonly used coding schemes:Two commonly used coding schemes:

– ASCIIASCII: American Standard Code Information : American Standard Code Information InterchangeInterchange

– EBCDICEBCDIC: Extended Binary Coded Decimal Interchange : Extended Binary Coded Decimal Interchange CodeCode

Representation of Representation of characterscharacters

CharacterCharacter EBCDICEBCDIC ASCIIASCIIAA 1100000111000001 0100000101000001BB 1100001011000010 0100001001000010aa 1000000110000001 0110000101100001bb 1000001010000010 011000100110001000 1111000011110000 001100000011000011 1111000111110001 001100010011000122 1111001011110010 0011001000110010

, (comma), (comma) 0110101101101011 0010110000101100- (dash)- (dash) 0110000001100000 0010010100100101

Representation of Representation of characterscharacters Foreign characters – two approachesForeign characters – two approaches

– Use one byte per charUse one byte per char Ex., Ex.,

– ISO-8859-1 for Western (Roman)ISO-8859-1 for Western (Roman)– ISO-8859-7 for GreekISO-8859-7 for Greek– ISO-2022-CN for simplified ChineseISO-2022-CN for simplified Chinese

Webpage: using “META charset=…” to specify Webpage: using “META charset=…” to specify which encoding is used.which encoding is used.

– Use two bytes per char/symbolsUse two bytes per char/symbols 16 bits have 65,536 combinations (characters)16 bits have 65,536 combinations (characters) Unicode coding systemUnicode coding system

Representation of ImagesRepresentation of Images

A picture is treated as a matrix of dots, called A picture is treated as a matrix of dots, called pixelspixels..


The pixels are so small and close The pixels are so small and close together we cannot really see together we cannot really see them as separate dots.them as separate dots.

Resolution: dots per inch (Resolution: dots per inch (dpidpi))– 72 dpi for Web images72 dpi for Web images– 600 or 1200 dpi for professional 600 or 1200 dpi for professional

printers or home photo printersprinters or home photo printers


The color of each pixel is represented using bits.The color of each pixel is represented using bits. Black/WhiteBlack/White: one bit per pixel: one bit per pixel

– 1-white and 0-black1-white and 0-black Gray scaleGray scale: one byte per pixel: one byte per pixel

– 256 different degrees of gray (00000000 to 11111111)256 different degrees of gray (00000000 to 11111111)– 00000000 black, 01111111 intermediate gray, 11111111 00000000 black, 01111111 intermediate gray, 11111111

white white ColorColor: three bytes per pixel: three bytes per pixel

– Red, green, blue colorRed, green, blue color– One byte for the intensity of each of the three colorOne byte for the intensity of each of the three color– 256 possible red, 256 green, 256 blue256 possible red, 256 green, 256 blue

Pure red: 11111111 for red byte, 00000000 for green and bluePure red: 11111111 for red byte, 00000000 for green and blue White: 11111111 for all three bytesWhite: 11111111 for all three bytes Black: 00000000 for all three bytes Black: 00000000 for all three bytes


Image storage -- sizeImage storage -- size Gray scaleGray scale: : one byteone byte per pixel per pixel

E.g., A 3 X 5 picture with 300 dpi resolutionE.g., A 3 X 5 picture with 300 dpi resolution 3 * 300 = 900 pixels per column3 * 300 = 900 pixels per column 5 * 300 = 1500 pixels per row5 * 300 = 1500 pixels per row 900 * 1500 = 1,350,000 pixels/picture900 * 1500 = 1,350,000 pixels/picture Needed storage = 1,350,000 bytes/picture = Needed storage = 1,350,000 bytes/picture =

1MB/picture1MB/picture ColorColor: : three bytesthree bytes per pixel per pixel

E.g., A 3 X 5 picture with 300 dpi resolutionE.g., A 3 X 5 picture with 300 dpi resolution 3 * 300 = 900 pixels per column3 * 300 = 900 pixels per column 5 * 300 = 1500 pixels per row5 * 300 = 1500 pixels per row 900 * 1500 = 1,350,000 pixels/picture900 * 1500 = 1,350,000 pixels/picture Needed storage = 3 (bytes per pixel) * 1,350,000 Needed storage = 3 (bytes per pixel) * 1,350,000 = 4,050,000 bytes/picture = 4,050,000 bytes/picture = 4MB/picture = 4MB/picture ------ TOO BIG TOO BIG


Image compressionImage compression Color tableColor table

– Most pictures contain a small # of different colorsMost pictures contain a small # of different colors– Use a table to define colors that are actually used Use a table to define colors that are actually used

in the picture in the picture – Each pixel has an index to the Each pixel has an index to the color tablecolor table..– Each image contains a Each image contains a color tablecolor table and and table indicestable indices– ExampleExample

For a picture with For a picture with 100 different colors100 different colors, the color table would , the color table would contain contain 100 entries100 entries, three bytes each entry for each color. , three bytes each entry for each color. One byteOne byte can be used as index to the table for each pixel. can be used as index to the table for each pixel.


Drawing commandsDrawing commands– Draw picture using basic commandsDraw picture using basic commands– Just as artists draws using a pencil or a Just as artists draws using a pencil or a

brush and other basic movements brush and other basic movements – Example,Example,

A house is drawn by sketching various A house is drawn by sketching various elements (doors, windows, walls), adding elements (doors, windows, walls), adding color to them, and moving to the desired color to them, and moving to the desired position.position.


Data averaging or samplingData averaging or sampling– Condense the size by selecting a smaller collection Condense the size by selecting a smaller collection

of information to store.of information to store.– Many different ways of sampling and data Many different ways of sampling and data

averagingaveraging– An example: choose to store only every other pixel An example: choose to store only every other pixel

in an image (in an image (samplingsampling)– reducing the size to half. )– reducing the size to half. To display the full picture, the computer need to To display the full picture, the computer need to fill in the missing data with, for example, the fill in the missing data with, for example, the average of neighboring pixels (average of neighboring pixels (data averagingdata averaging))

– The resulting picture cannot be as sharp as the The resulting picture cannot be as sharp as the original original

– Lossy data compressionLossy data compression

What are “.gif”, “.ps”, What are “.gif”, “.ps”, “.jpg”, “.bmp” formats?“.jpg”, “.bmp” formats? Commonly used image file formats -1Commonly used image file formats -1

– Bitmap (.bmp)Bitmap (.bmp) Pixel-by-pixel storage of all color information for each Pixel-by-pixel storage of all color information for each

pixel.pixel. Lossless representationLossless representation Files are huge.Files are huge.

– Graphics Interchange Format (.gif)Graphics Interchange Format (.gif) Use one or more color tables – the Use one or more color tables – the color tablecolor table technique technique Each table contains 256 colors. Each table contains 256 colors. Suitable for pictures with a small # (<256) of different Suitable for pictures with a small # (<256) of different

colors (e.g., organization charts)colors (e.g., organization charts) Not suitable for pictures with shading (e.g., photos)Not suitable for pictures with shading (e.g., photos)

What are “.gif”, “.ps”, What are “.gif”, “.ps”, “.jpg”, “.bmp” formats?“.jpg”, “.bmp” formats?

Commonly used image file formats - 2Commonly used image file formats - 2– PostScript (.ps)PostScript (.ps)

Employ the Employ the drawing commandsdrawing commands technique technique ““moveto” draws a line from current position to a new one and moveto” draws a line from current position to a new one and

“arc” draws an arc given its center, radius, etc“arc” draws an arc given its center, radius, etc General shapes can be used in multiple places General shapes can be used in multiple places Fonts can be reused.Fonts can be reused. Useful when the picture can be rendered as a drawing or its Useful when the picture can be rendered as a drawing or its

contains many of the same elements (e.g., text of the same contains many of the same elements (e.g., text of the same fonts)fonts)

– Joint Photographic Experts Group (JPEG) (.jpg)Joint Photographic Experts Group (JPEG) (.jpg) use the use the data averaging and samplingdata averaging and sampling on 8*8 pixel blocks on 8*8 pixel blocks User determines the level of details and clarityUser determines the level of details and clarity High-quality image – 8*8 blocks maintain their contentsHigh-quality image – 8*8 blocks maintain their contents Low-quality image – info in 8*8 blocks is discarded Low-quality image – info in 8*8 blocks is discarded smaller smaller

filesfiles

Comparison b/w jpg, gif, Comparison b/w jpg, gif, and psand ps Pictures in the textbookPictures in the textbook

http://www.cs.grinnell.edu/~walker/fluehttp://www.cs.grinnell.edu/~walker/fluency-book/figures/chapter2/fig-2-overviency-book/figures/chapter2/fig-2-overview.htmlw.html

Comparison of .jpg and .gifComparison of .jpg and .gifhttp://www.siriusweb.com/tutorials/gifvshttp://www.siriusweb.com/tutorials/gifvsjpg/jpg/

More on .jpg and .gifMore on .jpg and .gifhttp://www.wfu.edu/~matthews/misc/jpg_vs_gihttp://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.htmf/JpgVsGif.htm

ll

http://www.cs.grinnell.edu/~walker/fluency-book/figures/chapter2/fig-2-overview.html



http://www.siriusweb.com/tutorials/gifvsjpg/

http://www.siriusweb.com/tutorials/gifvsjpg/

http://www.wfu.edu/~matthews/misc/jpg_vs_gif/JpgVsGif.html



Summary of Image RepresentationsSummary of Image Representations

Other commonly used formatsOther commonly used formats– Tiff: Tagged Image File Format Tiff: Tagged Image File Format – PNG: Portable Network GraphicsPNG: Portable Network Graphics– New formats will emerge New formats will emerge

Understand the format and know Understand the format and know the pros and consthe pros and cons

To learn: Google the formatTo learn: Google the format Use programs (GIMP) to convert Use programs (GIMP) to convert

b/w formatsb/w formats

Summary – chapter 2Summary – chapter 2

Computers work in binaryComputers work in binary Integers may be constrained in sizeIntegers may be constrained in size Real numbers may have limited accuracyReal numbers may have limited accuracy Computations may produce roundoff errors, Computations may produce roundoff errors,

affecting accuracyaffecting accuracy Characters and languages are encoded in binaryCharacters and languages are encoded in binary Pictures are displayed pixel by pixelPictures are displayed pixel by pixel Color table, draw commands, and data Color table, draw commands, and data

averaging and sampling compression averaging and sampling compression techniquestechniques

.bmp, jpg, .gif, .ps formats.bmp, jpg, .gif, .ps formats

TerminologyTerminology

Binary vs. decimalBinary vs. decimal Position valuePosition value The base of a # The base of a #

systemsystem Bit/byte/KB/MB/GB/TBBit/byte/KB/MB/GB/TB Integer binary #sInteger binary #s Real # in binary Real # in binary Floating point numbersFloating point numbers Representational errorRepresentational error Roundoff errors Roundoff errors

ASCII/EBCDIC/UnicodeASCII/EBCDIC/Unicode PixelsPixels Dots per inch (dpi)Dots per inch (dpi) BitmapBitmap Color tableColor table Data averagingData averaging Data samplingData sampling Data compressionData compression .jpg, .bmp, .gif, .ps.jpg, .bmp, .gif, .ps

Documents

Chapter 2 : How are data represented ?