Upload
joe-gollner
View
2.200
Download
0
Embed Size (px)
DESCRIPTION
A presentation developed and delivered in 1995. It was designed to be part of a larger introduction to SGML. It is interesting today because it foregrounds many (if not all - and perhaps a few extra) of the themes being touched upon in discussions of Intelligent Content. It needed to be shared just in case someone thought that this was all new.
Citation preview
(1995)
figure list
para
document
title
Sub-title
+
Why SGML?
The Need for SGML
Course
Module *
Module
knowledge
information information
data data
...
+
+
?
* *
First delivered: 1995
www.gollner.ca
(1995)
What is SGML?
SGML stands for the Standard
Generalized
Markup
Language
SGML is an international (ISO) standard
ISO 8879:1986 Information Processing - Text and
Office Systems - Standard Generalized Markup
Language (SGML)
(1995)
What is SGML? Informal Definitions
SGML is a system and processing
independent means of representing,
creating, managing and exchanging
information.
SGML is an “intelligent markup language”
that protects the accessibility, usability, life
expectancy and value of information.
(1995)
Why SGML? A Meditation on a Paper Clip
The paper clip is a
low-tech version of
hypertext – facilitating
the physical association
of documents & fragments.
Often used in addition to
electronic files where
such associations cannot be
easily shown or enforced.
(1995) SGML was created
to better manage documents Publications
Training Manuals
Specifications
Documentation
Reports
Correspondence
Policies
Procedures
Standards
Plans
Directives
Commentaries
Proposals
(1995) Most Information
is held in Documents
Document Information Database Information
10% 90%
IM Budget
Allocations 90% 10%
(1995) Structured Database
Information
Formalized
Processes
Relational Structure
Strict Definitions Limited Access
Stable Organizational
Boundaries
Limited Flexibility
(1995)
Document Information
A Document is a meaningful organization of
Information
A Document is meaningful because it is
communicated between people to achieve
specific goals
A Document combines multiple media types
together in an organized, but not strictly
predictable, form that people can use
(1995)
Document Information Features
Chapter Title Section Title
1
Multiple
Dynamic
Processes
Wide and
Variable
Access
Hierarchical Structure
Variable Definitions
Variable Organizational
Boundaries
(1995)
Document Information Conclusions
Document Information does not fit within the
conventional Database paradigm
Database Information is organized
according to the needs of the Computer
Document Information is organized
according to the needs of the User
Few of the assumptions within the Database
Paradigm apply to Documents
(1995) Document Management
Technology Today
(1995)
Documents and Computers
Computers help us create more paper faster
Computers help us format printed
documents more efficiently and at less cost
Computers have not helped with the
management consequences
(1995)
The Document Explosion
The volume of documents is growing
exponentially
The visibility of document-based
transactions is increasing
The rise of the Internet and Enterprise
Integration dramatically alters the potential
user community of a document
Documents are becoming more complex,
larger and more varied in format
(1995)
Management Breakdown
Traditional Records Management practices
and technologies cannot cope with the
volume, complexity, or volatility of computer-
generated documents
The typical response has been to extend the
Database paradigm to document information
Given currently-used technology, the best
that can be done is the “Electronic Filing
Cabinet” (old tools made electronic - again)
(1995)
What’s Wrong
Computers traditionally store documents as
“objects”
Computers know very little (almost nothing)
about these objects some management information (author, version, date)
little awareness of document content
less awareness of document structure
Computers can only associate some
information with the objects as the objects
have no inherent “intelligence”
(1995)
New Technologies
Applications have evolved to redress some
of these shortcomings
“Electronic Filing Cabinets” associate
management information with document
objects and physically control events
Full-Text Retrieval technologies have been
used to access Document “Content”
Word Processors are used to infer the
structure of documents based on format
(styles and templates)
(1995)
Electronic Filing Cabinets
In an “Electronic Filing Cabinet”
environment, management information is
associated with these “objects”
Document objects that leave the sphere of
control are no longer managed
Chapter Title Section Title
1
Chapter Title Section Title
1
Chapter Title Section Title
1
Chapter Title Section Title
1
Sphere of Control
(1995)
Full-Text Retrieval
Create external indices of the textual content
of a document
Various text indexing algorithms are used to
support searches by word, by text string,
proximity, exclusion and so on
Useful but imprecise as document volume
increases
New technologies arising to improve search
precision (lexicon-based, links to metadata)
(1995)
Word Processors
Evolving to include basic management
information (profiles)
Evolving to include template structures
(document types)
Management and structural information only
accessible through Word Processor
application (directly or via API)
These new Word Processing features are
not generally used
(1995)
Proprietary Documents
The basic problem is that traditional
documents are produced and maintained in
a proprietary and non-intelligent format
Electronic Documents are simply paper
documents in a more reproducible form
Electronic Documents are printed for use
People retain and use hardcopy “files”
New Applications still assume a static
environment and single format use
(1995)
Proprietary Formats
Word Processing applications offer an
enhanced implementation of the typewriter,
the copy editor and the typesetter
Word Processing applications Add formatting instructions to text
Execute formatting instructions to produce an output
(operating system and printer interface)
Formatting Instructions are specific to the
application that created them and the
platform on which they were created
(1995)
Procedural Markup Processing Instructions
Chapter Title
Section Title
1
12 pt. bold Helvetica
10 pt. bold Helvetica
8 pt. Times
on 10 pt. leading
8 pt. Times
on 10 pt. leading
7 pt. Helvetica bold
(1995)
Proprietary Markup Typical of Word Processors
[Center][Und On]SGML[Und Off][Hrt]
[Hrt]
[Font: Helvetica 10pt]
[Indent]Introduction[Hrt]
[Hrt]
[Font: Times Roman 8pt]
[Tab]Someday [Italic On]information
[Italic Off] will be free.[Hrt]
Position
Style
Font
(1995) Binary Storage Formats Highly Proprietary and
Optimized for Performance
ÿWPC-$ �
� ûÿ� 2 �� � � B ÿÿH W
Z � � �� #| x � �
cpi) Courier 12pt (10cpi) Courier 12pt (10cpi) (Bold) CG Times (WN)
(Italic) ÿÿÿÿÿÿÿÿÿÿÿÿÿÿHP LaserJet
III HPLASIII.WRS Û�x �-�Œ
��@É ‡Ï� � ,�È ,�,�4Y-œJX�@Ð�� � � �ÐÓ�� USCE� �Óûÿ� 2 Ø�
ÿÿ1 O� ÿÿ… € � ÿÿ� R ÿÿ Ÿ Courier 12pt (10cpi) Courier 12pt
(10cpi) (Bold) CG Times (WN) (Italic) CG Times (WN) (Bold Italic) Univers (WN) Univers (WN)
Q���X�˜þþþþþþþÿÿÿÿÿÿÿÿþÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿûÿ� 2 _
��@�
� ÿÿd J� ��@� ®� ÿÿq î
�" ‚ ÿÿÿÿ5�ÿÿ…�ÿÿû�ÿÿÿÿÿÿ@�ÿÿÿÿÿÿ^;C`cc±›CCCc±CCCCccccccccccCCDZÇc±zz
…�zr��CY…o¦…�z�zco�z¦zooCCCcccccYcY7cc77Y7�ccccMM7cY…YYMYcYc± ;; !cc
c Rc c c zczczczczc±……YzYzYzYzYC7C7C7C7…c•c•c•c•c•c•c•c•c;Yzc•c•c
�coY�czczczczc…Y …Y…c zczczc�c�c � �c�c�c�ccccccc Y …Yo7 oR
…c …c •c;;zM zRcM;;N; \ ccCc\\cc ;cc±±cF ccc±F CC ;;;;;; ;;;
; ;; ; CFtC±nn ± ± ÅyyÑ
2 co ±7¥ �c Ÿ Å Ñ ¥ \\™™™
HP LaserJet!
(1995)
Proprietary Documents
Are proprietary to the originating software
Limit or obstruct cross-platform interchange
Are non-intelligent
provide no consistent mechanism to determine
document context, content, or structure
provide no means to enhance automation
Support only one output rendering (print)
Will become obsolete
Information in an obsolete format
is itself obsolete!
(1995) Portability Problems Paper remains the format for
Document Interchange
Chapter Title Section Title
1
Chapter Title Section Title
1
Chapter Title Section Title
1
(1995)
Low Document Intelligence Marginal Automated Support
for Business Processes
Lack of Document Intelligence prevents
computers from providing effective
document management or workflow support
Paper remains the working medium
Chapter Title Section Title
1
Approval
Review
(1995) Single Output Formats
Create Additional Costs
WP Printed
Documents
Conversion $
CD ROM
Conversion $
WWW
Conversion $
Database
Proprietary
Formatting
(1995) Obsolescence Information must survive when
Products become obsolete
Multimate
WPS Plus
Display Write
Lotus Manuscript
Lanier
Wang
Mass-11
WPS-8
CPT
Word-11
NBI Legend
Xywrite
Where are they now?
(1995)
Summary
Traditional computing technology and
management practices are failing to cope
with the increasing volume of documents
Non-Intelligent, Proprietary document
formatting restricts document manageability,
portability, utility, quality, affordability,
suitability for multi-format publishing, and
longevity.
Business is therefore conducted in paper!
(1995)
Are your information assets
frozen in Proprietary Formats?