58
Data Representation Art 311 Dr. J R. Parker

Data Representation Art 311 Dr. J R. Parker. Key Concept #4 - Archive Which is to say, memory. Humans have a memory of their own experiences, but have

Embed Size (px)

Citation preview

Data Representation

Art 311

Dr. J R. Parker

Key Concept #4 - Archive

Which is to say, memory.

Humans have a memory of their own experiences, but have come to rule the planet by extending that (via teaching) to memories of past and other people’s experiences.

How is this stored, accessed?

Key Concept #4 - Archive

We now have unparalleled access to public information.

We are now personally responsible for our own information and our family’s.

We have immediate access to most human knowledge.

We can not only read but publish, both truth and lies.

What is real?

Key Concept #4 - Archive

Archive in new media is partly what is being manipulated by the interfaces.

Flickr, Youtube manipulate images

Blogs, eBooks, web sites

but also museums, libraries.

Data Representation

The basic question today is:

“Given that a computer only manipulates numbers, how can we represent interesting things like images, sounds, graphics, text, video, and so on”?

The answer differs depending on the type of data.

Data Representation

This subject is basic to creating new things on a computer.

We need to become familiar with the standard methods of representing data.

We need to acquire the skill of inventing new representation for things that we ourselves invent.

So, how would you represent music (IE notes)? More later…

Text

So, we’ve already seen how text is represented, at least briefly. Remember ASCII?

There are 96 printable characters.

128 characters altogether.

Text

What do we need to consider when building a text representation?

- Upper case/lower case- The space character has to come before others to make

sorting easy.- non-alphanumeric characters were positioned to correspond to

their shifted position on typewriters - The first two columns (32 positions) were reserved for control

characters. - The digits 0–9 were placed so they correspond to values in

binary prefixed with 011, making conversion with binary-coded decimal straightforward.

Text

This is a typical ASCII table.

You do not need to know it.

An irritating detail is that characters are defined in a base-16 number system called HEXADECIMAL or just HEX.

Why? Allow me to explain.

Hexadecimal

My feeling, after being involved with computers since 1971, is that computer guys are lazy. This is not a bad thing, but motivates much of what they do.

We are so lazy that we will spend days writing a program to do simple things that we’re bored with doing.

Much of the history of computing can be explained by the need to avoid tedious repetitive work using a computer.

Hexadecimal

So, HEX:

All numbers in a computer are binary, or base 20001 is one0010 is two0100 is four1000 is eightAnd so on. Powers of 2, like decimal numbers use

powers of ten

Hexadecimal numbers use base 16.Why is this convenient? I’m getting there.

Hexadecimal

Base 16 is a problem, as we would nee 16 distinct characters as digits. We use letters A,B,C,D,E,F in conjunction with our regular digits.

So 1 is still one … and 9 is still nine.But A is ten B is eleven C is twelve D is thirteen E is fourteenAnd F is fifteen

Why is this convenient? I’m getting there.

Hexadecimal

Positional number systems use powers of the base.

160 is 1161 is 16162 is 256163 is 4096…

Why is this convenient? I’m getting there.

Hexadecimal

Counting in base 16:

0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F,10, 11,

12,13,14,15,16,17,18,19,1A,1B,`1C,1D,

1E,1F, 20, …

so 2016 = 3210

Why is this convenient? I’m getting there.

Hexadecimal

Reminder: why are we doing this? So we can read computer science tables and documents. Like the ASCII table. (I’m teaching you to read!)

Converting: 1216 is 1x16 + 2 = 1810

2A116 is 2x256 + 10x16 + 1 = 67310

Why is this convenient? I’m getting there.

Hexadecimal

Now, 16 is an exact power of 2 (it is 24)

Each hex digit takes exactly 4 binary digits (BITS) to represent in binary.

So converting from hex to binary and back is trivially simple.

Converting hex to binary: replace each hex digit with the binary equivalent

2 A 1

2A116 = 0010 1010 0001 = 0010101000012 (= 67310)

Why is this convenient? I’m getting there.

Hexadecimal

Converting binary to hex: group binary number into sets of 4 digits (bits) and convert those into hex.

So 0110101001010010101 becomes 011 0101 0010 1001 0101 (group from the right) 3 A 2 9 5

3A9516 = 01101010010100101012

That’s why this is convenient.

Hexadecimal

Yup, that’s it.

Easy conversion between Hex and Binary, and hex uses many fewer digits.

We can list binary numbers in a lot less space.

That’s why this is convenient. Let’s move on …

Text

So characters are binary numbers when stored in memory, and they are often coded using ASCII.

A string is a sequence of characters. In a file we can indicate them using quotes: “This is a string”

In memory they are placed in consecutive locations.

Two ways to do this: 1. Start with an indication of how many characters there

are.2. Terminate the string with a special character

Text 16T 52 52h 104 104i 105 105s 115 115 32 32i 105 105s 115 115 32 32a 97 97 32 32s 115 115t 116 116r 114 114I 105 105n 110 110g 103 103 0

The first string beginsWith a count.

The second ends with a character whose code is 0 – this is a nul character, and thestring is referred to as a nul terminated string.

DatesISO 1987-10-12IBM USA 10/12/1987IBM Europe 12.10.1987Unf Julian 1987285Julian 87/285MDY 10/12/87YMD 87/12/10DMY 12/10/87 October 12, 1987 12 Oct 87 Oct 12, 1987Etc etc

Of the text strings, dates are the hardest to deal with

There are many, many ways to display them, and many things we want to do with them.

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

Questions – why do we use date information?

Shouldn’t the representation make answering the common questions simple?

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

Store year as 4 digits (avoids the Y2K problem)

Do not store month as string. Hard to use that way – store as a number.Store day as number.EG 2012 01 12

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

BUT: each month has a different number of days. This makes differences hard to calculate.Days between Mar 10 and May 12?

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

Days between Mar 10 and May 12 = 63 (not counting last day)

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

The international standard ISO 8601 describes a string representation for dates and times. Two simple examples of this format are2007-03-04 20:32:17 20070304T203217

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

both stand for the 4th of March 2007, a bit after half past eight in the evening (forgot about time)2007-03-04 20:32:17 20070304T203217

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

Unix time:The number of seconds elapsed since the beginning of the year 1970.

1172960204.226908

Dates

Has X passed?Print X in a particular wayHow many days since X?How long until X?Input X from the console

This discussion was started just to show how some simple things can become complicated.

We use dates all of the time, but the millennia has made them complex rather than simple.

Text

Printing – give the address of a string to the printer. It converts the numbers (characters) into electronic signals which print the characters (or draw character images onto a page)

Characters each have an image that represents them. It’s called a glyph.

Glyphs

A glyph is a simple graphic.

The letter ‘B’ is drawn as:

The paper is white, and the drawn glyph consists of black ’spots’ drawn by the printer on a 2D mesh or grid.

This is a simple image – more on images later.

Glyphs

The point is that, for any particular size (indicated by how many dots are on each side of the glyph image) a character glyph contains a certain percentage of black.

That can be thought of as how black the glyph is.

This allows us to create images with characters.

Glyphs

For the ‘B’ on the right, there are 11 rows and 10 columns = 110 squares.

Of those, 8+5+4+4+3+6+4+4+4+5+8=55

55/110 = 50%

So any spot on an image to be created that is 50% black can be drawn as a ‘B’

Glyphs

Darker .'`,^:";~ /|\ -_+<>i!lI? | /\|()1{}[] rcvunxzjft | LCJUYXZO0Q \|/ oahkbdpqwm Lighter *WMB8&%$#@

This is for white characters on a black background. RevereseFor printing on paper.

ASCII Images

+WWWMMWWX;VBVIVVXRRRMMMWWWWWWWMMWWWWMMMBRRBRRRVi MWWWWBRMBYXVVXI+;;+IIXBWWWWWWWWWWWWWWMMBBBRBBBBMI XWWWWMVRXVt;t+=IXBRRYi=iVMWWWWWWMXVYIYVYVBBBBBBRBMBI ,MWMWBYXXRBR=.=tYVBMMWMV=+RMWWWWBXVIVRRRRViIBBRBXYMWWV MWRRItRBMMMM::+,+ttIVVMM;iBMWWMRRMMWWMBRRRYiVVtVVRBMY MWXtX=tMMMMMIt,.:=tYBMRBBIBWWMMMVIItXBMXVVI+tIiI:YRXMB WMBIiR+YtBBMXRBMMMMMMMBRYRMWMMMMWWXti,.;tItIIYYiRRBBMMV ,MWMMtIRR=,+XBBMMMMMMMMWR:;RWWBRMMWWWMBRXYXBBXYYV+VMWMBMMM VWMMi::BW, IIRMMMMWWWWX+..,I+:iMMWWWWWWMMMBBRXVXYRMMWMBMMX +WMMY::,BX :RMBMMMWWMMRYVVXMWBXVMWWWWWWWMMBRXXXXYIRMBMMBBMR =WMBBi:tt tBBRBBMMMMRYtitYYVXMWWWWWMWWMMBXXVYXVi:VMMMMMMWV RWRV+,;,, XBRRRBBBt,..:+t=+:..+RWWWWWMBBXVVVXXIiI:IVMMBMMMW IMRBRRBMB= YXRRRBBBB,..+YBBBBV...=MWWMBBRRVVXXBYXRRBXBMMBMMMB tMYitIXMi iYXXRXRBRt;.:itt=:=iXMWWWMBBRRVXXRRVRMWWMMMMMMBBB :MR:.,:Ii YVVXRBBBMMRRRBMWMMMMMMMMBBRXXXYY+VMXMWMBMMBBVVRI :RY+ iVRBBBBMMMWWWMMMMMMMBBBBRRXVt.iXYRYXMBBBYItVVt =XVY, ;tVRBBMMMWWWMMMMBMBBBBXVYt iXBMWWMRVXYXMMY RX:, ,,;+tIIVXRBBRXVVIIi=,.= iYYt+;,;iYRMMXt

ASCII Images

ASCII Images

ASCII Line Images

Pictures

The pictures we have seen are rows and columns of ascii (characters).

Computer images are always stored in that way, but are not ASCII.

We have a 2D grid of elements, let’s say boxes, each havingA distinct colour or grey level. Like a TV image.

Pictures

Pictures

Row1

2

3

4

5

6

Column 1 2 3 4 5 6 7 8 9 10

Picture elements (Pixels) are identified by [row, column]

4,5

Pictures

Picture elements are numbers that indicate a colour or a grey level. EG let 0 be black and 1 be white:Letter ‘T’

000000000000000000000000000000000000000 000000000000111111111111111111100000000 000000000000111111111111111111100000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000011111000000000000000 000000000000000000000000000000000000000

Graphics (line drawings)

Lines are drawn on a canvas or background of some kind. It has a size.

Lines can be defines by specifying the end point, and these can be specified as pixels.

So (10,10) (20,20) is a line (segment) between those two pixels.Entire objects can be drawn using these segments alone.

Graphics (line drawings)

Graphics (line drawings)

Sound

Computer sound is a sequence of loudness measurements, recorded as electronic levels or voltages, converted into binary,and stored (in order) in a file.

Sound

Data is read by bouncing a low-powered laser beam off the reflective coating in the disc. Light hitting a land (a flat area) is reflected back, and picked up by a photosensitive detector. Light hitting a pit is reflected back with far less intensity.

1’s and 0’s.

Video

Video is a sequence of pictures, sampled at a known rate.

TV is nearly 30 pictures (frames) per second.35MM film is 24 frames per second.

We can use any rate we like.

What other things are there?

ANYTHING that a computer manipulates is stored as numbers, and the scheme used to convert to numbers from whatever is called a coding scheme.

A codec is short for coder/decoder, and is software that implementsThe coding scheme.

Code? What do you mean, ‘code’?

Video is a series if images that, when displayed rapidly one after the other, give the illusion of motion. Like a ‘flip book’

However, a TV image is 512x512 (just about) = 262K.1 second = 30 x 262K = 7.86Mbyte1 minute = 471 Mbyte1 hour = 28 Gbyte

We need to compress the images, and that’s where code/decode comes in.

Code? What do you mean, ‘code’?

Code? What do you mean, ‘code’?

Each image can be compressed. JPEG compression can reduce size by a factor of 15 before artifacts can be seen clearly

Code? What do you mean, ‘code’?

In a video, we can also compression between consecutive images.

Code? What do you mean, ‘code’?

MPEG tries to predict motion based on previous and post frames.

An I-frame showing a triangle on white background! A following P-frame shows the same triangle but at another position.

Code? What do you mean, ‘code’? Reconstruction of inter coded frames

goes ahead in two steps:

Application of the motion vector to the

referred frame; Adding the

prediction error

compensation to the result;

Code? What do you mean, ‘code’?

As an example the frame sequence above is transfered in the following order:

I P B B B P B B B. The only task of the decoder is to reorder the

reconstructed frames. To support this an ascending frame number comes with each frame (modulo 1024).

Compression

At this point we quit. It’s too much detail.

I find compression dull, but some folks spend their lives working on these things.

DVD video is also compressed, a kind of MPEG.

AVI is a file format, within which various kinds of compression can be used.

MOV (quicktime) is also a multimedia container file that can support many audio and video formats (AIFF, WAV, DV, MP3, and MPEG-1.

What else??

Questions on how to store any other kinds of data??