44
Data Representation

Data Representation - .: FTSM representation.pdf · Number Systems Computer perform all of their operations using the binary (base 2). Program code and data are stored and manipulated

  • Upload
    vandieu

  • View
    214

  • Download
    1

Embed Size (px)

Citation preview

Data Representation

Contents This lecture will address:

◦ Several different number systems.

◦ Data format:

Alphanumeric character.

Image data.

Audio data.

Data compression

Internal computer data format.

◦ Representing Integer data.

◦ Floating point number.

Number Systems

Computer perform all of their operations using the binary (base 2).

◦ Program code and data are stored and manipulated in binary.

◦ Each digit in a binary number is known as a bit (value 0 or 1).

◦ Bits are commonly stored and manipulated in groups of:

8 bit: Byte.

16 bit : Halfword.

32 bit: Word.

64 bit: Doubleword

Number Systems

◦ The number of bits used in calculations affects

the accuracy and size limitations.

◦ In programming language, programmer can

define a signed integer variable to be:

short (16 bit)

int (32 bit)

long (64 bit).

Number Systems

Common number systems used when working

with computers include:

◦ binary

◦ base 10 (decimal)

◦ base 8 (octal)

◦ base 16 (hexadecimal)

Number Systems: Counting in Different Base

Base 10:

0,1,2,3,4,5,6,7,8,9,10,11,12,…99,100….

Base 8:

0,1,2,3,4,5,6,7,10,11,12,…17,20,…77,100,..

Base 16:

0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,10,11…FF,100,..

Base 2:

0, 1, 10, 11, 110, 111,…….

Numeric Conversion between Numbers

Convert the number to base 10.

◦ E.g. 137548= ???

(1x84)+(3x83)+(7x82)+(5x81)+(4x80) = 612410

Other method.

1x8=8

(8+3)x8=88

(88+7)x8=760

(760+5)x8=6120

(6120+4)= 612410

Numeric Conversion between Numbers

Convert the number from base 10.

◦ E.g. 612410= ??? 5

residuals

6124/5 = 1224 4

1224/5=244 4

244/5=48 4

48/5=9 3

9/5=1 4

1/5=0 1

1434445

Convert Binary Number to Hex

E.g. 0011 0101 1101 1000

Group by 4 digit

3

5 D 8

35D816

Most computer manufacturers prefer to use hexadecimal, since 16-bit or 32-

bit number can be represented exactly by a four- or eight-digit hex number.

Conversion between binary hex are used frequently.

Data Formats

Since all data/codes in computer are binary, it is

almost always necessary to convert our words,

numbers, images and sounds into a different form in

order to store and process them in the computer.

Original data (character, image, etc.) must be brought

initially into the computer and converted into an

appropriate computer representation so that it can be

processed, stored and used within the computer

system.

Data Formats

Different input devices are used for converting

original data into computer format.

◦ Keyboard: Generate binary number code for each

key.

◦ Microphone: Convert analog sound into binary data

using ADC.

◦ Camera: Convert analog picture into binary data

using ADC.

◦ Etc.

Data Formats There must be agreement between input-output devices, so

that the data is displayed correctly.

If necessary, translation programs can be used to translate from

one representation to another. Example:

◦ Data from keyboard enters the computer in the form of

character stream.

For storage and transmission of data, a representation different

from that used for internal processing is often necessary,

◦ i.e. in addition to the actual data representing points in an

image, for example, the system must also store and pass

along information that describes or interprets the meaning

of data.

◦ This information is known as metadata.

◦ E.g. graphic image: Type of graphical image, colour format, etc.

Data Formats

Individual programs can store and process data in any format that they want.

The format used by individual programs are known as proprietary formats.

However, standard data representation exist to be used as interfaces between different programs, between program and IO devices, between interconnected hardware, and between systems that shared data.

Data Formats

Many different standards in use for different types of data. Some common data representation are:

Type of data Standard

Alphanumeric Unicode, ASCII, EBCDIC

Image (bitmap) GIF(graphical image format), TIFF (tagged image

file format), PNG (portable network graphics)

Image (object) PostScript, JPEG, SWF (Macromedia Flash),

SVG.

Outline graphics and

fonts

PostScript, TrueType

Sound WAV, AVI, MP3, MIDI, WMA

Page description Pdf (Adobe Portable Document Format), HTML,

XML.

Video Quicktime, MPEG-2, WMV

Alphanumeric character data Characters, number digits, and punctuation : alphanumeric data.

Since the is no processing capability in the keyboard itself, number data

must be entered into the computer just like other characters, one digit at a

time.

◦ Conversion will be done using software.

Alphanumeric data must be stored and processed within computer in

binary form character translation.

◦ The choice of code used is arbitrary.

◦ Three common alphanumeric code:

Unicode

ASCII (American Standard Code for Information Interchange).

EBCDIC (Extended Binary Coded Decimal Interchange Code)..

Many computer/terminal use: Unicode or ASCII.

ASCII Code Table

The codes are in hex. This is a 7-bit code 128 entries.

ASCII

Note that ASCII are designed so that the order of the letters is such that a simple numerical sort on the codes can be used within the computer to perform alphabetization.

The order of codes in the representation table is known as its collating sequence.

There are two classes of codes:

◦ Printing characters – produce output on the screen/printer.

◦ Control characters – use to control the position of the output on the screen/paper, to cause some action to occur (e.g. ringing a bell, deleting a character), etc.

Control Code Definitions

Except for position control characters, the control characters are struck by

holding down the Control key and striking a character. The code executed

corresponds in table position to the position of the same alphabetic character.

e.g. “Ctrl A” is for executing SOH.

ASCII vs Unicode

Due to the limitation of 7-bit ASCII code, American National Standard Institute (ANSI) also extend the 7-bit ASCII code to 8-bit code, known as Latin-I.

Latin-I is an ISO standard.

However, the 8-bit code still not adequate for representing all possible characters in use Unicode.

Unicode can represent 65,536 characters, of which approximately 49,000 have been defined.

More recent standard, Unicode 3.1 supports millions of different characters.

Unicode is multilingual in the most global sense.

Two-byte Unicode Table

Keyboard Input

When key is struck on the keyboard, the circuitry in the keyboard generates a binary code, called a scan code.

When key is released, a different code is generated.

The scan codes are converted to Unicode, ASCII or EBCDIC codes by software within terminal or PC to which the keyboard is connected.

Advantage of software conversion – use of the keyboard can be easily change to correspond to different language and keyboard layout.

Alternative Sources of Alphanumeric

Input Optical character recognition:

◦ Scan text with an image scanner and

convert the image into alphanumeric data

form using optical character recognition

(OCR) software.

Bar code readers:

◦ Bar code represent alphanumeric data. Bar

code are read optically using a device

called a wand that converts a visual scan of

the code into electrical binary signals that

a bar translation module can read.

Alternative Sources of Alphanumeric Input

Magnetic stripe reader:

◦ Read alphanumeric data form credit cards and other

similar devices.

Voice input:

◦ It is currently possible and practical to digitised

audio for use as input data. However, technology to

interpret audio data as voice input and to translate

the data into alphanumeric form is still primitive.

Image Data

Images used in computer: Bitmap and object images. Different computer representations and processing techniques are used for each category.

Bitmap image/raster image: e.g. photograph and painting. ◦ Produced by: scanner, digital camera, video camera

frame grabber, software program such as paint.

◦ To maintain and reproduce the detail of these images, it is necessary to represent and store each individual point within the image.

◦ GIF and JPEG formats are common bitmap image using on the Web.

Image Data

Object image/vector image: made up of graphical shapes such as line, circle, etc. that can be defined geometrically. ◦ Produced using drawing or design package.

◦ Example: the movies Shrek and Toy Story are the object images.

Image Input

Image scanner.

Digital camera.

Video capture devices.

Graphical input using pointing devices.

Audio Data

Few different formats are used for storing

audio waveform, e.g.:

◦ .MOD

◦ .MIDI

◦ .VOC

◦ .WAV

◦ MP3

Data Compression

Due to the volume of multimedia data,

particularly video, but also sound and images,

data compression is usually desirable.

Two categories of data compression:

◦ Lossless – allow complete recovery of the original

noncompressed data.

◦ Lossy – does not allow recovery but is designed to

be perceived as sufficient by the user.

Data Formats

Internally, all data, regardless of use, are stored

in binary number.

Instructions in the computer support

interpretation of these numbers as character,

integers, pointers, and floating point numbers.

No special provision is made the storage of

algebraic sign or decimal point that might be

associated with a number.

Representing Integer Data

Unsigned integer can be stored using unsigned

binary or binary-coded decimal (BCD).

◦ unsigned binary – the range of integers that we can

store is determined by the number of bits available, i.e.

8-bit binary, for example, can store an unsigned integer

of value between 0 and 255.

For storing larger numbers, multiple storage locations of 8-bit

is used.

◦ BCD – the number is stored as a digit-by-digit binary

representation of the original decimal integer. Each

decimal digit is individually converted to 4-bit binary.

Storage of a 32-bit Data Word

Representation for Signed Integers The most common method to represent signed numbers is

using 2’s complement representation.

The 2’s complement of a number can be found in one of two

ways:

◦ Subtract the value from the modulus or

◦ Find the 1’s complement by inverting all 1’s and 0’s and adding

1 to the result (common method use in computer).

Two’s Complement

Representation

2’s complement representation

Example:

The number +2 in 8-bit number is: 0000 0010

The number -2 in 8-bit number is: 1111 1110

1’s complement: 1111 11 01

+ 1

2’s complement: 1111 1110

Floating Point Representation

In computing, floating point describes a

method of representing an approximation

to real numbers in a way that can support

a wide range of values.

Numbers are, in general, represented

approximately to a fixed number of

significant digits and scaled using an

exponent.

±S X B±E

Typical 32-bit Floating Point Formats

Example :

0.1110 X 25 = 110 X 22 = 0.0110 X 28

IEEE 754 Format

Example:

Put 0.085 in single-precision format

The first step is to look at the sign of

the number.

Because 0.085 is positive, the sign bit =0.

(-1)0 = 1.

Write 0.085 in base-2 scientific

notation.

This means that we must factor it into a

number in the range [1 <= n < 2] and a

power of 2.

0.085 = (-1)0 * (1+fraction) *

2 power, or: 0.085 / 2power = (1+fraction).

So we can divide 0.085 by a power of 2 to

get the (1 + fraction).

0.085 / 2-1 = 0.17

0.085 / 2-2 = 0.34

0.085 / 2-3 = 0.68

0.085 / 2-4 = 1.36

Therefore, 0.085 = 1.36 * 2-4

Find the exponent.

The power of 2 is -4, and the bias for the

single-precision format is 127. This means

that the exponent = 123ten,

or 01111011bin

Write the fraction in binary form

The fraction = 0.36 . Unfortunately, this is

not a "pretty" number, like those shown in

the book. The best we can do is to

approximate the value. Single-precision

format allows 23 bits for the fraction.

Binary fractions look like this:

0.1 = (1/2) = 2-1

0.01 = (1/4) = 2-2

0.001 = (1/8) = 2-3

To approximate 0.36, we can say:

0.36 = (0/2) + (1/4) + (0/8)

+ (1/16) + (1/32) +...

0.36 = 2-2 + 2-4 + 2-5+...

0.36ten ~

0.01011100001010001111011bin .

The binary string we need

is: 01011100001010001111011.

Now put the binary strings in the

correct order -

1 bit for the sign, followed by 8 for

the exponent, and 23 for the fraction. The

answer is:

Sign Exponent Fraction

Decimal 0 123 0.36

Binary 0 01111011 01011100001010001111011

Thank you

Q & A