First text chapter

Higgins’ course powerpoints

Powerpoint 1 for the first two chapters of text

Remarks

• Text powerpoints are distributed to course instructors. I have put these on my P:drive.

• Course materials are very nice but omit some details and examples which I’ll try to fill in.

• My supplementary powerpoints will be on my w:drive.

Protocols

• Nodes and domains have names like HigginDM and www.oneonta.edu.

• Computers use numeric values to represent these addresses on the web, ie. Your “IP” address.

• Organizations are assigned blocks of addresses.• Domains are represented from most general to

more specific.• Servers implementing Domain Name System

(DNS) perform translation from fully-qualified domain names to numeric IP addresses

http://www.oneonta.edu/

Web or internet

• The internet refers to the physical computers connected by wires.

• Computers communicate via TCP/IP (transmission control protocol-internet protocol) and higher level protocols often sit on this.

• Typically individual machines connect to a LAN and thence to the internet via a single server.

• The web refers to the software and protocol(s) (principally http) by which these machines communicate.

Servers

• Servers make documents in the document root available to clients via URLs which map to the actual file.

• Remember, addresses (URLs) are mapped by the server to their physical location so, for example, http://www.flowers.com/tulips.html might be mapped to a file on the physical server machine at /admin/web/topdocs/bulbs/tulips.html

http://www.flowers.com/tulips.html

Apache

• Apache is an open-source server, free for download. • The XAMPP download of Apache contains MySQL and

Filezilla and, if you like, Tomcat as well. It also has perl.• Apache requires the Tomcat servlet container (a mini-

server in its own right) to run JSP or servlets.• Apache is the most popular server on the web.• We will use Apache in the labs to serve perl and php. • Ruby will be served with Webrick or Mongrel servers

which come with the Instant Rails download.• The name (Apache ) comes from “a patchy” version of

the httpd server.• The config file httpd.conf in Apache has its name for

historic reasons.

instant rails

• A newer distribution (more much later) called “instant rails” contains php and apache, mysql, as well as ruby. Perl can be enabled by adding a cgi-bin directory to the server directory in this distribution.

• We will likely use instant rails rather than XAMPP.

Programmer’s toolbox• The text contains some coverage of XHTML, JavaScript, Perl, PHP,

Ruby, as well as Java (Applets, Servlets and JSP).• Java is too complicated for us to cover in the context of this course. • JavaScript is not Java. It got its name because it was developed

around the time Java became very popular and it has a similar syntax.

• PHP developed out of a more limited project called “personal-home-page”.

• Ruby is a strictly OO language (like smalltalk and more so than java) developed by Matsumoto in the early 1990s out of dissatisfaction with Python and Perl. The text develops Ruby with Rails applications and we will do some of that.

More… Ruby…Rails…Ajax

• This intro is in the text ed 4 but missing from the new ppt for chapter 1.

• In Ruby, everything is an object and operators are just methods and can be overridden. All variables are dynamically typed and no declarations are used. Methods are also dynamic in the sense that they can be added during a program’s execution.

• Rails is a development framework for web-based applications that may access a database. The Rails “framework” produces the standard components of an application and the developer customizes these. Rails is based on the MVC architecture which separates presentation and data model from the logic.

More… Ruby…Rails…Ajax

• Rails is written in Ruby and designed to be used with Ruby though that is not a requirement.

• AJAX is shorthand for asynchronous-javascript and XML. In traditional web interactions the client sends messages to the server (by clicking something, for example) and then the client waits. The current browser display is then replaced with the new document provided by the server. The transmission and rendering time can be disruptive. In AJAX web apps, the browser doesn’t need to wait for server response to continue, and server responses represent a smaller part of the entire document so it can be displayed faster.

scripting vs programming languages

• JavaScript and PHP (ASP and JSP, too) are “scripting languages” because they are often embedded in a larger or different application context, HTML.

• That is, bits of script are interspersed with standard html.

• Java, Ruby and Perl are full blown programming languages.

• Our coverage will be more of a survey, with plenty of examples, of perl, php, ruby and javascript.

Perl• Before scripting languages like asp, jsp and php were

available, a document could request a program to be run on the server.

• This is done via the common gateway interface (CGI). • CGI is a mechanism by which the browser and server

communicate to run a program on the server and return the results.

• Executing a perlscript on the server spawns a thread running the perl interpreter.

• C is also used for CGI but Perl is probably the most popular.

• Because of overhead, speed limitations, and the availability of other solutions (like php, jsp, applets and servlets, and so on) perl has undergone some renovations (Mod-Perl) and also waned in popularity.

client-server

• Browsers are software that run on client machines and deliver documents – often html but not always- to the user.

• Servers are software that sit on a logically (though not necessarily physically) remote machine and deliver responses to ‘get’, ‘post’, ‘trace’ and other client requests.

Servers

• One directory structure under the server is the document root where documents which the server can access are stored.

• The server root and other directories under it store the server itself and support software.

• The server maps requested URLs to files under the document root (whose actual location is not known to the client).

HTTP: hypertext transfer protocol

XHTML

• HTML and XML are both derived from SGML (Standard Generalized Mark-up Language), an ISO standard for describing text-formatting languages.

• HTML was originally intended to described document structure for uniform presentation across different browsers and platforms.

XML and XHTML

• HTML went through a number of transformations, but still does not guarantee standardized presentation across browsers.

• It is not a strict language, and syntactically incorrect HTML is usually presented anyway in some fashion or other.

• XML is a strict notational format. X stands for eXtensible.

• XML can be “extended”, by adding tags for application-specific features.

XML and XHTML

• XML documents can be validated to access their syntactical correctness and adherence to a predefined application definition (a DTD or Schema).

• XHTML (1.0 in 2000) redefined HTML as an XML language with its own DTD, so it can be validated.

• Latest versions of MS IE and mozilla support (mostly) the latest standard, XHTML 1.1

• Stylesheets (CSS) are now a deprecated features of HTML 4.0.

• There are three “levels” of XHTML 1.0 including a “transitional” phase allowing inclusion of deprecated features of HTML 4.0.

HTML or XML?

• Some old browsers choke on XHTML• Because so many documents on the web are in HTML it

will be supported for a long time.• XML requires more programmer discipline to write.• HTML documents lack consistency, because browsers

don’t enforce standards and programmers don’t adhere to the ones it does have.

• XML documents can be validated by an XML browser or a validating tool (some free for download).

• XHTML editors provide support for creating XHTML documents.

XHTML Validation

• WWW-consortium has acronym W3C

• It has a file-upload validation service at http://validator.w3.org/file-upload.html

• Screenshot in text

• MS has a validator you might be able to find and download for desktop use.

http://validator.w3.org/file-upload.html

Document Structure

• XML documents must have a single “root” node. Who it is depends on the application.

• XHTML documents must have HTML as the root.

Slightly modified text example

<?xml version = "1.0" encoding="utf-8"?> <!DOCTYPE html PUBLIC "-//w3c//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd"><html xmlns = "http://www.w3.org/1999/xhtml"> <head> <title> Our first document </title></head> <body> Greetings from your Webmaster! </body></html>

Running validator on the previous example – missing encoding

A variation of text example of blockquote tag

Just the body<body> Abraham Lincoln was president of the US through the Civil War and assasinated by John Wilkes Booth at the Ford Theatre. His Gettysburg address took place on the field of perhaps the greatest battle of the war... <blockquote> "Four score and seven years ago our fathers brought forth on this continent, a new

nation, conceived in Liberty, and dedicated to the proposition that all men are created

equal. We are now engaged in a great civil war, testing whether that nation, or any nation, so conceived, and so dedicated, can long endure </blockquote> The Civil War and Lincoln left a lasting imprint on our nation </body>

meta

• The meta element provides additional information about a document. It has no content but provides information through attributes name and content.

• A common name is keywords with content set to various document keywords.<meta name=“keywords” content=“binary

trees, linked lists, stacks” />

About graphics file formats

• Text doesn’t have much to say, and we don’t really need to know all this, but the next bunch of slides (21-32) are notes from Wikipedia

Wikipedia article on file compression lists:

• 4 Major graphic file formats • 4.1 Raster formats

– 4.1.1 JPEG – 4.1.2 TIFF – 4.1.3 RAW – 4.1.4 PNG – 4.1.5 GIF – 4.1.6 BMP – 4.1.7 WDP – 4.1.8 XPM – 4.1.9 MrSID

• 4.2 Vector formats – 4.2.1 SVG

Formats… continued JPEG• The JPEG (Joint Photographic Experts Group) image files are a

lossy format. The DOS filename extension is JPG, although other operating systems may use JPEG. Nearly all digital cameras have the option to save images in JPEG format. The JPEG format supports 8-bit per color - red, green, and blue, for 24-bit total - and produces relatively small file sizes. Fortunately, the compression in most cases does not detract noticeably from the image. But JPEG files do suffer generational degradation when repeatedly edited and saved. Photographic images are best stored in a lossless non-JPEG format if they will be re-edited in future, or if the presence of small "artifacts" (blemishes), due to the nature of the JPEG compression algorithm, is unacceptable. JPEG is also used as the image compression algorithm in many Adobe PDF files.

http://en.wikipedia.org/wiki/JPEG

http://en.wikipedia.org/wiki/JPEG

http://en.wikipedia.org/wiki/DOS

http://en.wikipedia.org/wiki/Filename_extension

http://en.wikipedia.org/wiki/Operating_system

Lossy and lossless• A lossy data compression method is one where compressing data

and then decompressing it retrieves data that may well be different from the original, but is "close enough" to be useful in some way. Lossy data compression is most commonly used to compress multimedia data (audio, video, still images) especially in applications, such as streaming media and internet telephony. On the other hand lossless compression is preferred for text and data files, such as, bank records, text articles etc.

• Most lossy data compression formats suffer from generation loss: repeatedly compressing and decompressing the file will cause it to progressively lose quality. This is in contrast with lossless data compression.

http://en.wikipedia.org/wiki/Data_compression

http://en.wikipedia.org/wiki/Multimedia

http://en.wikipedia.org/wiki/Audio

http://en.wikipedia.org/wiki/Video

http://en.wikipedia.org/wiki/Images

http://en.wikipedia.org/wiki/Streaming_media

http://en.wikipedia.org/wiki/VOIP

http://en.wikipedia.org/wiki/Lossless_compression

http://en.wikipedia.org/wiki/Generation_loss

http://en.wikipedia.org/wiki/Lossless_data_compression

Lossless data compression

• Lossless data compression is a class of data compression algorithms that allows the exact original data to be reconstructed from the compressed data. This can be contrasted to lossy data compression, which does not allow the exact original data to be reconstructed from the compressed data.

• Lossless data compression is used in many applications. For example, it is used in the popular ZIP file format and in the Unix tool gzip.

http://en.wikipedia.org/wiki/Data_compression

http://en.wikipedia.org/wiki/Algorithm

http://en.wikipedia.org/wiki/Lossy_data_compression

http://en.wikipedia.org/wiki/ZIP_%28file_format%29

http://en.wikipedia.org/wiki/Gzip

The TIFF (Tagged Image File Format)• is a flexible image format that normally saves 16-bit per

color - red, green and blue for a total of 48-bits - or 8-bit per color - red, green and blue for a total of 24-bits - and uses a filename extension of TIFF or TIF. TIFF's flexibility is both a feature and a curse, with no single reader capable of handling all the different varieties of TIFF files. TIFF can be lossy or lossless. Some types of TIFF offer relatively good lossless compression for bi-level (black and white, no grey) images. Some high-end digital cameras have the option to save images in the TIFF format, using the LZW compression algorithm for lossless storage. The TIFF image format is not widely supported by web browsers, and should not be used on the World Wide Web. TIFF is still widely accepted as a photograph file standard in the printing industry. TIFF is capable of handling device-specific color spaces, such as the CMYK defined by a particular set of printing press inks.

http://en.wikipedia.org/wiki/TIFF

http://en.wikipedia.org/wiki/Bi-level_image

http://en.wikipedia.org/wiki/LZW

The RAW image format

• is a file option available on some digital cameras. It usually uses a lossless compression and produces file sizes much smaller than the TIFF format. Unfortunately, the RAW format is not standard among all camera manufacturers and some graphic programs and image editors may not accept the RAW format. The better graphic editors can read some manufacturer's RAW formats, and some (mostly higher-end) digital cameras also support saving images in the TIFF format directly. Adobe's Digital Negative Specification is a recent (September 2004) attempt at standardizing the various "raw" file formats used by digital cameras.

http://en.wikipedia.org/wiki/RAW_image_format

http://en.wikipedia.org/wiki/Digital_Negative_%28file_format%29

The PNG (Portable Network Graphics) file format

• is regarded and was made as the free and open-source successor to the GIF file format. The PNG file format supports true color (16 million colors) whereas the GIF file format only allows 256 colors. PNG excels when the image has large areas of uniform color. The lossless PNG format is best suited for editing pictures, and the lossy formats like JPG are best for final distribution of photographic-type images because of smaller file size. Many older browsers do not yet support the PNG file format, however with the release of Internet Explorer 7 all popular modern browsers fully support PNG. The Adam7-interlacing allows an early preview even when only a small percentage of the data of the image has been transmitted.

http://en.wikipedia.org/wiki/PNG

http://en.wikipedia.org/wiki/Comparison_of_web_browsers#Image_format_support

http://en.wikipedia.org/wiki/Internet_Explorer_7

http://en.wikipedia.org/wiki/Adam7_algorithm

GIF (Graphic Interchange Format)

• is limited to an 8-bit palette, or 256 colors. This makes the GIF format suitable for storing graphics with relatively few colors such as simple diagrams, shapes and cartoon style images. The GIF format supports animation and is still widely used to provide image animation effects. It also uses a lossless compression that is more effective when large areas have a single color, and ineffective for detailed images or dithered images.

http://en.wikipedia.org/wiki/GIF

http://en.wikipedia.org/wiki/Dither

The BMP (bit mapped) format

• is used internally in the Microsoft Windows operating system to handle graphics images. These files are typically not compressed resulting in large files. The main advantage of BMP files is their wide acceptance, simplicity, and use in Windows programs. However, they may pose problems for users of other operating systems. Commonly, BMP files are used for Microsoft's Paint program. Since most BMP files are uncompressed, and BMP's RLE compression has serious limits [citation needed], the large size of BMP files makes them unsuitable for file transfer. Desktop backgrounds and images from scanners are usually stored in BMP files.

http://en.wikipedia.org/wiki/Windows_bitmap

http://en.wikipedia.org/wiki/Microsoft_Windows

http://en.wikipedia.org/wiki/Wikipedia:Citing_sources

The WDP format

• is the newly introduced image format by Microsoft for media print quality, lossless image compression. This image standard has a specific applicability to mostly print media due to its size although it is rumored to be the standard for Microsoft Office 2007 and the Windows Vista operating system. This format is very similar to the TIFF format, but can handle a much larger range of image types and qualities such as 8, 16, and 32 bits per channel processing, N-Channel support, and embedded tiling.

http://en.wikipedia.org/wiki/Windows_Media_Photo

The XPM format

• is the default X Window System picture format (very popular in the Linux world). Its structure is based on the string format of the C programming language. Because XPM was designed to be human-readable, and is stored as uncompressed plain-text, the file size of these pictures can be more than twice as large as uncompressed binary bitmap files (such as BMP, uncompressed TIFF, MacOS-PICT, or Irix-RGB formats). This format is unsupported by most non-Unix software and operating systems (though many web-browsers retain display support for the XBM subset, which was the minimal image format in the early days of the WWW).

http://en.wikipedia.org/wiki/XPM_%28image_format%29

http://en.wikipedia.org/wiki/X_Window_System

http://en.wikipedia.org/wiki/C_Programming_Language

http://en.wikipedia.org/wiki/XBM

The MrSID (Multiresolution Seamless Image Database) format

• is a wavelet compression format used mostly by Geographic Information Systems to store massive satellite imagery for map software.

http://en.wikipedia.org/wiki/MrSID

http://en.wikipedia.org/wiki/Wavelet_compression

http://en.wikipedia.org/wiki/Geographic_information_system

http://en.wikipedia.org/wiki/Satellite_imagery

Vector formats

– See also: Encapsulated PostScript, PDF, SWF, Windows Metafile, AutoCAD DXF, and CorelDRAW CDR

• As opposed to the raster image formats above (where the data describes the characteristics of each individual pixel), vector image formats contain a geometric description which can be rendered smoothly at any desired display size.

• Vector file formats can contain bitmap data as well. 3D graphic file formats are technically vector formats with pixel data texture mapping on the surface of a vector virtual object, warped to match the angle of the viewing perspective.

• At some point, all vector graphics must be rasterized in order to be displayed on digital monitors. However vector images can be displayed with analog CRT technology such as what is used in some electronic test equipment, medical monitors, radar displays, laser shows and early video games. Plotters are printers that use vector data rather than pixel data to draw graphics.

http://en.wikipedia.org/wiki/Encapsulated_PostScript

http://en.wikipedia.org/wiki/PDF

http://en.wikipedia.org/wiki/SWF

http://en.wikipedia.org/wiki/Windows_Metafile

http://en.wikipedia.org/wiki/AutoCAD_DXF

http://en.wikipedia.org/w/index.php?title=CorelDRAW_CDR&action=edit

http://en.wikipedia.org/wiki/Raster

http://en.wikipedia.org/wiki/Vector_graphics

http://en.wikipedia.org/wiki/3D_graphics

http://en.wikipedia.org/wiki/Texture_mapping

http://en.wikipedia.org/wiki/Cathode_ray_tube

SVG (Scalable Vector Graphics)

• is an open standard created and developed by the World Wide Web Consortium to address the need (and attempts of several corporations) for a versatile, scriptable and all-purpose vector format for the web and otherwise. The SVG format does not have a compression scheme of its own, but due to the textual nature of XML, an SVG graphic can be compressed using a program such as gzip. Because of its scripting potential, SVG is a key component in web applications: interactive web pages that look and act like applications.

2.10 Frames

• Frames are rectangular sections of the display window, each of which can display a different document

• Because frames are no longer part of XHTML, you cannot validate a document that includes frames

• The <frameset> tag specifies the number of frames and their layout in the window• <frameset> takes the place of <body>• Cannot have both!• <frameset> must have either a rows attribute or a cols attribute, or both (usually

the case)• Default is 1• The possible values for rows and cols are numbers, percentages, and asterisks

• A number value specifies the row height in pixels - Not terribly useful!• A percentage specifies the percentage of total window height for the row -

Very useful!

2.10 Frames (continued)

– An asterisk after some other specification gives the remainder of the height of the window

– Examples:

<frameset rows = "150, 200, 300">

<frameset rows = "25%, 50%, 25%">

<frameset rows = "50%, 20%, *" >

<frameset rows = "50%, 25%, 25%" cols = "40%, *">

• The <frame> tag specifies the content of a frame• The first <frame> tag in a <frameset> specifies the content of the first frame,

etc.– Row-major order is used– Frame content is specified with the src attribute – Without a src attribute, the frame will be empty (such a frame CANNOT be filled later)

• If <frameset> has fewer <frame> tags than frames, the extra frames are empty


• Scrollbars are implicitly included if needed (they

are needed if the specified document will not fit)

• If a name attribute is included, the content of the

frame can be changed later (by selection of a

link in some other frame)

SHOW frames.html

• Note: the Frameset standard must be specified in

the DOCTYPE declaration


<html xmlns = ″http://www.w3.org/1999/xhtml″> <head> <title> Table of Contents Frame </title> </head> <body> <h4> Fruits </h4> <ul> <li> <a href = "apples.html" target = "descriptions"> apples </a> <li> <a href = "bananas.html" target = "descriptions"> bananas </a> <li> <a href = "oranges.html" target = "descriptions"> oranges </a> </ul> </body></html>


• Nested frames - to divide the screen in more interesting ways

SHOW nested_frames.html

2.11 Syntactic Differences between HTML & XHTML

• Case sensitivity• Closing tags• Quoted attribute values• Explicit attribute values• id and name attributes• Element nesting

Documents

First text chapter