HTML File Formats
HTML 3.2 - 4.01Based loosely around SGML; v4 more soBoth beyond browser techLess specific standards1997: 4.0 minor complaints turned into 4.01 around 19994.01 transitional is most common HTML4.01 strict is not forgiving (for HTML, its more lax than XHTML)4.01 frame is for doing frameset pages only- subset for speed etc.NOT VALID XML
XHTML v1.0-1.1-2.02000XML based HTML (instead of SGML)All tags MUST have END TAGS XML dtd, schema, relax-ng file format definition files can be used to validate itFar more details specified so browsers act more similar
XMLeXendable Markup LanguageConcept was 1st, formal language came much laterMake your own HTML-like language!Validate files for errors generically (XML)Also Validate against standard definition files
XML FormatsMathML - math notationSVG - vector graphicsSMIL - generic animation (meant to be merged into other XML formats)XFORM - business forms (too complex)ODF - Office SoftwareXSLT - Convert xml documents into other formatsSOAP - XML RPCSchema - doc type defRelax-NG - doc type defPLIST - property list
HTML5TWO technical flavors: XGML based and XML based (depends on file header)Despite being XGML and XML it is supposed to be basically the sameChanges / fixes from the 10 years of HTML 4 use and failure of XHTML to take overMany useful content tags
Goofy File Header is actually from SGML (old)all HTML should have DOCTYPE at the TOPXHTML is XML based (not SGML) so no DOCTYPES needed!You dont need to get it - copy paste depending on the HTML standard you useValidators depend heavily on it
XHTML 1.0 Transitional:
rest of pageBEST version of HTML to use right nowXHTML works better but strict isnt fun
Structured Content Philosophy
Structure 4 MeaningContent meaning is tagged: Tag application is more consistentSpecial browsers can act smarterAids in language translation, localizationPresentation usually follows meaning
Text Meaning Tagsabbraddressblockquotecaptioncitecodedddeldfndivdldtemh1-h6inskbdliolpqsampstrongsubsupulvar
Presentation Tagsareabblockquote *brdivh1-h6 *hrimapprespanstylesub *sup *tabletbodytdtfootththeadtr
HTML 5 Tagsarticleasideb *datagriddetailsdialogheaderi *figurefootermeternavoutputsectiontime
Tags Worth Usingaabbraddressareabbaseblockquotebodybrbuttoncaptioncitecodedddeldfndivdldtemfieldsetformh1-h6headhrhtmliimginputinskbdlabellegendlilinkmapmetanoscriptobjectoloptgroupoptionpparampreqsampscriptselectspanstrongstylesubsuptabletbodytdtextareatfootththeadtitletrulvar
The Object Perspective
OOP might be like:var X= new TagObject(p);X.setAttribute(align, center);X.innerHTML= Paragraph of text;document.appendChild( X );
A Tag Element Object
Parsing OverviewGeneric SGML / XML parsed: Parsed TAG data:Tag/Element nameattributesConstruct New Object( with these attributes )Attach this new Object to its Parent Object to maintain the relationship between the tags: p.children= b;
GIFinterlacing, transparent colors, animation2 - 256 colors ONLYJPG (JPEG)PNG (sometimes pronounced ping)
IMG tag attributes
Care in Selecting an ImageMake sure a user can still read the text.Avoid putting text into an image.Do not use a large image file. (Less than 20 KB) More will increase load times.Background must look seamless, not tiled.Dont link to another site for an image
JPEGs (JPG)JPGs can be compressed and yield smaller file sizes in some casesPrimarily used when you want to have all 16.7 million colorsJPEG 2000 is not widely supportedNO transparency! NO animation
GIF2 - 256 colorsTransparency (uses one color)ANIMATIONGreat for small or low color images (small file)
- PNG (ping)Portable Network Graphiczero quality loss8-bit (2-256 color) OR 24-bit (16.7 million)Transparency (8-bit alpha mask)AnimationMS IE
APNGPNG with animationSimilar to GIF animationLARGE FILES-- useful only on small thingsBrowsers lack support for it (2008)
Quicker PagesReduce image file sizesReduce number of colors in imagesUse smallest file typeThumbnailsReuse images and backgrounds
FlashFlash is NOT an imageFlash is a plug-in which is widely distributedFlash STARTED as a vector image formatanimation was supportedFlash grew into a means to force macromedias multimedia software (Director/Shockwave) onto the web
SVGXML based Vector graphicsAnimation supportedImages supported - external images, like HTML does itText supportedCSS used for text & graphic presentationPossible to INTEGRATE inside XHTML
Find ExamplesAny webpage you can SAVE and view the codeDesktop browsers have a View Source featureview menuright click context menuVALIDATE! it might work only for you
The key thing to remember is that these are ALL HTML file formats and the differences between them are minor (but make a big difference to the browsers in how they handle certain problems.)FILE is also key-- it is just a file format which contains links to other files (by using URLs.) The browser is essentially just downloading files over the internet and for the types of files it understands, it displays them to you. With PDF browser support its possible navigate a whole web site made up entirely of PDF documents!HTML 1 and 2 hardly existed with 1st one only in draft form; the web took off during HTML 3.2 HTML 4.01 is still the most commonly used version of HTML.
Browsers filled in gaps in standard and made up their own features (MS and netscape) creating inconsistencies even in the official features. The standard + difficulty level created a problem IN ADDITION to the corps undermining the standard.
Technical browser reasons removed frames from HTML 4.01 and put them into a spin-off HTML 4.01 frameset which contains only the frameset related HTML, as opposed to having it all exist in 1 file format like HTML 3. Since browsers support much of HTML 3 using mostly the same code for HTML 4 and how lax HTML 4 transitional-- browsers work even when in error; meaning HTML 4 frameset is rarely used nor does it provide much significant benefit on modern browsers/computers.XHTML was like HTML 5 minor changes made, new base language and more strict definitionsBecause it was based upon XML instead of SGML and had more simple rules (like tags must have end tags) they changed the name to XHTML and started over at 1.0XML is POWERFUL, human understandable (tagged text), and 10x slower than older binary based generic formats. (less of an issue these days-- when intercommunication is more important than speed)
DTD -- Document Type Definition is an SGML file format which defines the tagging rules, its legacy SGML which is used for HTML 3,4 but carried over into XML because XML was more simple
SCHEMA - just like a DTD but it was written in XML to be consistent, now much more popular.
RELAX-NG - More powerful and easier than the others and offers a more readable compact format which easily is translated back into XML.MathML started in HTML 3 not being ready and TOO complex. Spun off into its own format; math notation was far more complex than initially thought-- especially in trying to draw the math..
SVG is the open source FLASH format; it does vector images and animation but its XML based unlike flash-- making it far more powerful. Unlike flash, it retains its purpose, while flashs widespread use have become leverage for pushing video/audio/scripting multimedia etcIt almost could become its own browser itself. (which could happen-- especially as new flash versions add support for rendering HTML inside of flash!) Flash is not open its growth is largely a result of marketing.
XHTML 2 has many upset, it kills form support and requires the use of XFORM.At this time, HTML5 is just a draft.
ONE tricky rule to make it easier for browsers also makes life hell for webmasters--- tags can only have children of 1 type: block-level or inline-level content. Before you were free (outside of impossible situations) and now all direct-child tags have to be of 1 kind: block or inline only.SGML is a precursor to HTML and XML-- its old complex and hardly used anywhere but its influence was HUGE and some of the oddities in HTML came from it-- doctype is one of them
Browsers make decisions especially when troubleshooting bugs based upon the version of HTML you are using! Also, when browsers know which format and you have no errors they run FASTER and with fewer of their own bugs showing!
Yes, you can skip it-- but the browser will GUESS as will the validator-- which may not produce the results you want! (not to mention it might run slower for the users)Validators and browsers handle XHTML better and faster. HTML 4 transitional is the MOST flexible forgiving and BUGGY version (and most widely used globally.)
Strict/Transitional means that you are forgiving for doing some things the older way like mixing in some dead tags or HUMAN ERROR etc. Strict is really for MACHINES not for humans-- it should never have any errors in it. a headline is more than a presentation of one it has meaning-- you even read it aloud slightly differently than plain text.
visually would be strike thru, even the standards people were confused by having multiple tags that did similar things finally its just this tag.
Screen reader could READ ALOUD the text with sarcastic emphasis or strongly, increasing the level depending upon how deeply nested it is (2x s