Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Open XML Technical Training Final Draft 1.7 2/20/2007 1-1
Office Open XML Formats Fundamentals
1
This module provides an introduction to the Ecma Office Open XML Formats. It includes an
overview of Open XML Formats and discusses how they have paved the way for more
sophisticated and powerful uses for Microsoft Office products.
Goal & Objectives
The goal of this module is to familiarize you with the features, and benefits of the 2007 Microsoft Office
system with Open XML Formats in the 2007 Microsoft® Office system.
After completing this module, you will be able to:
Discuss the background and role of the Office Open XML File Formats in the 2007
Microsoft® Office system.
Understand the importance and benefits of XML.
Understand the role of standards organizations.
Key Concepts
XML
Extensible Markup Language (XML) - a simple, flexible text file format designed for electronic publishing and
the exchange of a wide variety of data on the Internet and elsewhere.
XML Schema
An XML reference schema provides a blueprint for the data that can be stored in an Open XML file. For
example, an XML schema for library books defines data elements such as title, author, and catalog number. An
XML schema for real estate listings defines data elements such as address, square footage, and price.
Ecma International
Ecma International is a standards organization that is the approving authority for Microsoft’s Open XML
Formats standards.
Open XML Formats Training 2/20/2007 1-2
Ecma Office Open XML Formats
The Ecma Office Open XML Formats are standard file formats that describe a family of XML schemas,
collectively called Office Open XML, which define the XML vocabularies for word-processing, spreadsheet, and
presentation documents, as well as the packaging of documents that conform to these schemas. The goal is to
enable the implementation of the Office Open XML formats by the widest set of tools and platforms, fostering
interoperability across office productivity applications and line-of-business systems, as well as to support and
strengthen document archival and preservation.Flex
Find it here: http://www.ecma-international.org
Open XML Technical Training Final Draft 1.7 2/20/2007 1-3
1. The Evolution of Document Authoring
As technology advances, the way people work advances as well. Moving beyond the point where paper
documents can serve as the “document of record,” the digital work style requires smarter applications capable
of producing self-aware electronic masters that survive over extended periods of time, can be shared among
many people using diverse computing platforms and know what they are and where they need to go. The
digital work style also demands smarter documents that go beyond simple fixed files and can enable advanced
productivity scenarios.
At the time of the release of Microsoft® Office ’97, new binary file formats were actually introduced for some
Office applications. The file format design addressed the core needs of the computing environment of the
time – compact, binary file formats that could be easily exchanged on floppy disks or as e-mail attachments,
and were optimized for editing and printing by individual users.
A primary function of Word 97 was to produce electronic documents using a familiar tool. The electronic
version became a temporary state during the document authoring process. Defined as “complete” only once
the final version was printed, the distributed, printed document became the master version. This paper-
focused output reinforced the linear, static aspect of documents.
Technology has changed our workplace substantially since 1997. Now viewed as temporary and disposable,
paper copies are mainly used to facilitate tasks such as heavy reading, annotation, and other activities for
which the paper interface is better suited. The master version is now more likely to be the electronic
document rather than the paper version.
Electronic document formats have enabled a host of automatic, machine-driven processes. With the maturity
of online document libraries, search, backup, and security capabilities, the value in keeping the document in
electronic form has significantly increased. Collaboration that used to take place face-to-face now takes place
electronically. Activities such as records management, project management, scheduling, and tracking are now
handled electronically.
The way people create documents is also undergoing an important transformation. Instead of building
documents by cutting and pasting from different data sources, and cobbling together information from old
documents, “live” connections to external data sources are used to build documents. That means any
document can be updated in real time. It also means everyone is always working with the same up-to-date
information, removing the danger of having two different sets of data included in a single document.
Organizations deploy templates that guide authors through highly regulated or structured document creation
processes to ensure the most recent and relevant data is included in documents. The ability to connect to this
external data is important to help automate the document authoring process.
The problem, however, is building templates that easily connect to these external data sources. Currently,
solution providers have limited access to binary file format specifications, do not have easy access to the
contents, and are forced to rely on integration with the applications that author the documents. The opaque
nature of a binary file format, together with potential file corruption issues and application version conflicts,
prevents solution providers from quickly accessing and interacting with the contents.
Open XML Formats Training 2/20/2007 1-4
The ability to share documents and integrate external data requires that applications and file formats support
broadly accessible, standardized integration standards, such as XML and Web services. These standardized
integration technologies enable developers to build powerful solutions without having to decode proprietary
applications or file formats. Standards-based integration technology frees solution providers from focusing on
the application that creates the information, and enables them to focus more on the solution they need to
build.
The original XML-based format options with Office 2003 sparked a new level of interest in XML document
solutions from hundreds of thousands of developers. To help broaden access to the digital work style, the
2007 Microsoft® Office system release introduces Open XML Formats that are specifically designed to enable
interoperability with other systems and programs, and has significantly improved the ability to create and
share smarter documents.
2. How XML Enables the Future of Document Authoring
A Brief History of XML
XML has been used for about ten years and an XML file is just plain text. It is, by definition, an eXtensible
Markup Language. Since it is extensible, XML and the languages defined using XML, will continue to evolve for
as long as it remains in use.
There are specialized versions of XML used for many different purposes. For example, SportsML is a markup
language that enables sportswriters and statisticians to exchange information about events in a uniform
language. Docbook XML provides a system for writing structured documents using SGML or XML. (SGML
stands for Standard Generalized Markup Language, a predecessor to XML that defined structure for content.).
Docbook XML is slowly replacing the older, once widespread SGML standard as a specific XML used for
publishing long, often technical, documents. All XML standards have their roots in the SGML standard. The
ISO approved its first SGML standard in 1986.
Here’s a short sample of some Open XML from a spreadsheet file:
<sheetData>
<row r="1" spans="1:3">
<c r="A1"><v>1</v></c>
<c r="B1"><v>2</v></c>
<c r="C1"><v>3</v></c>
</row>
</sheetData>
The tags r and c stand for row and column, while v tells the application the next character(s) will be the
corresponding value.
Open XML Formats Training 2/20/2007 1-5
The Role of XML File Formats
For a number of years, IT professionals have strived to reduce operational costs by achieving greater
interoperability between programs and streamlining processes that involve information exchange. Many have
expressed a desire to move away from proprietary systems and formats towards ‘open’ standards that enable
true interoperability and information exchange.
To that end, a new level of transparency and openness is being established to ensure that organizations are
not locked into a single vendor or platform, and can easily transfer documents and share information across
applications and systems.
To facilitate the openness and transparency required to participate in the shifting work paradigm, and to
transition away from the locked-in approach to system design, new ‘open’ standards and open file formats
have been established for the various software applications used for word processing, spreadsheet generation
and the creation of presentations.
Microsoft first began transitioning to these open file formats with Office 2000. Specifically, Office 2000
introduced SpreadsheetML, a markup language for representing Microsoft Excel workbooks in XML format.
The XML investment continued in Office 2003 with the introduction of WordProcessingML, an XML-based file
format for Microsoft Word documents. The 2007 Microsoft® Office system continues this evolution by
improving the XML capabilities of SpreadsheetML, and WordProcessingML, and introducing the new
PresentationML.
Openness & Transparency
The Open XML Formats offer greater transparency and openness than were previously possible with binary file
formats. Based on XML, these formats are readily open and are designed with long-term robustness and
accessibility in mind. Organizations can now have greater access to document contents without being
dependent on any particular software application or software vendor.
The Ecma Office Open XML Formats are published as an Ecma International standard. Open file format
licenses ensure that any technology provider can freely incorporate Open XML Formats into their technologies
without financial or other consideration to Microsoft. The published specifications and access to the full
documentation means that anyone can quickly learn how to integrate Office files into their solutions.1
1 Refer to Ecma http://www.ecma-international.org/news/TC45_current_work/TC45-2006-
50_draft14.htm.
Open XML Formats Training 2/20/2007 1-6
Interoperability
Much like the Office system itself, the Open XML Formats are designed to provide interoperability. The open
and transparent nature of the Open XML Formats ensures solution developers can integrate document
contents into solutions that automate business processes, facilitate document assembly and enable a host of
other advanced scenarios.
Open XML Formats offer a way to achieve industry alignment using standardized technologies and enable
complete data interoperability between documents, applications and systems. Solution developers can now
build intelligent applications that improve data context and quality, and allow information to be captured and
reused between many data sources.
Going beyond the fundamentals of XML-based document descriptions, the Open XML Formats enable
organizations to use their own XML vocabularies to capture information within documents. This separates the
information in the document from the presentation; making the data highly portable to other applications and
systems.
Transitioning Office Users to XML-based Formats
Customers tell us they want solutions that preserve interoperability while simultaneously expanding consumer
choices. The transition to the new Office XML formats, however, raises some important questions: Will these
formats be backward compatible? Can I use these new file formats with prior releases of Office? Will I be able
to exchange documents between 2007 Microsoft Office System and other releases of Office?
In the case of the Open XML Formats, the answer is ‘yes’ to all these questions. With the standardization of
Office Open XML file formats, customers can access their own content for generations to come. Customers
get a greater ability to manage the content of their documents, to use, reuse, and track the content via a
standard XML specification even with the many new tools that will spring up. Partners and competitors alike
will find many innovative opportunities to meet these needs.
Open XML Technical Training Final Draft 1.7 2/20/2007 1-7
3. The Path to Interoperability: How Office Currently Supports XML
Office is designed for interoperability and XML integration in Office opens many new doors for information
exchange and collaboration.
Microsoft first introduced XML support to Office by pioneering XML-based document properties in Office
2000, enabling organizations to easily extract metadata from documents.
Office XP introduced SpreadsheetML as the first Office XML-based file format, enabling Excel users to store
data in an XML-based format, and enabling direct access to spreadsheet contents without using Excel. This
was a boon to developers who sought to access spreadsheet data to offer it to analysis programs and
reporting systems.
In Office 2003, Microsoft continued to make large investments in XML. WordProcessingML was introduced, as
well as support for custom-defined schemas, enabling XML-based content tagging using a taxonomy that
makes sense to individual organizations.
Other XML investment areas in Office 2003 were Reference Schemas for Word, Excel and InfoPath file formats
that were made available under an open license to help developers integrate the Office file formats more
easily.
Today, Microsoft continues to take important steps to enable the openness, transparency and interoperability
required for the future of document authoring. Continued XML investments in Office enable people to fully
utilize the power of the information stored throughout the organization; they are no longer limited by product
functionality when it comes to sharing information. Users can collect, share and publish information from
numerous data sources and eliminate many of the time-consuming, error-prone tasks associated with
information gathering and document creation.
By connecting users to external data, XML support in the Office system allows communication between
business systems and data sources, and between systems that are written in different languages on different
platforms. XML extensibility tools in Office can be used to deliver smart clients based on Office technologies
that take full advantage of Web services by accessing the information directly and dynamically.
XML support in Office enables solutions that recognize the structure and meaning of the content within
documents and respond intelligently to the user. Application intelligence can be used to validate information
or data as it is input, avoiding errors and aiding in data cleansing and normalization.
The breadth of XML support within the Microsoft Office system facilitates the modern work environment,
where copy / paste, manual editing and continuous entry of the same information is a thing of the past.
In addition to document formats, Office supports many XML-based data exchange methods. Word, Excel and
InfoPath all support the use of XML Web services to facilitate connections to external content.
Word Smart Document Solutions enable organizations to deploy sophisticated document templates that
combine external data sources and programming to guide users through the process of authoring highly
structured or complex documents.
Open XML Formats Training 2/20/2007 1-8
Figure 1: Path to Interoperability
Open XML Technical Training Final Draft 1.7 2/20/2007 1-9
4. Customer Use Case Examples – Office 2003
The following table shows a few examples of how XML has been used in real-world situations. Several of these
examples are highlighted below where excerpts show the customer solutions using the 2003 Excel and Word
XML Schemas.
Find it here: www.microsoft.com/resources/casestudies.
Customer Product Solution
Advisory Board Company, The Excel custom-defined
schema
XML data-driven charting and
presentation data for automated
presentation development.
Northumberland College Word custom-defined
schema
XML-based tool to automate the
processing of self-assessment reports.
Siemens Word custom-defined
schema
Data collaboration system
Gol Airlines Excel custom-defined
schema
Extract data from Open Skies® to present
flight timeline information schematically
in Excel 2003.
CLE British Columbia Word custom-defined
schema
XML-based authoring and document-
publishing system for book-publishing
Wortmann AG Excel custom-defined
schema
Extract geographic data from Navision
and import it into Excel 2003.
Open University, The Word custom-defined
schema
Content Authoring Tool—create XML
structured documents that can be easily
published via print and the Web.
McGraw-Hill Construction Word custom-defined
schema
An online service creates customized
customer-defined views of construction
information and integrated data housed
in previously isolated databases.
Continental Airlines Word custom-defined Solution puts Advisories into the hands of
maintenance personnel more quickly,
Open XML Formats Training 2/20/2007 1-10
schema improving the bottom line and favorable
audit results
PGGM Word + InfoPath
custom-defined
schema
Document info system automates doc
handling, updates data, and archives
electronically. XML Web services provide
integration between desktop and server
Danish InfoStructure Base XML Reference
Schemas
Open publishing format for documents
endorsed by government
CambridgeDoc XML Reference
Schemas
Open publishing format
Excel Custom Schema Examples
1. The Advisory Board Company
Customer Profile
The Advisory Board Company–an organization that provides best-practice research
and analysis to some of the largest and most progressive health systems and medical
centers in the United States–found it difficult and expensive to provide clients with
customized reports using standard desktop productivity software. To address this
problem, they turned to Microsoft and Dell Professional Services to develop a
comprehensive, automated presentation generation solution based on the Microsoft®
Office System. After rapid and easy development of its prototype, the Advisory
Board now has a simple, straightforward process by which it has been able to
simplify report production, lower costs, and provide its Members more valuable
information tailored to customer needs.
Business Situation
A labor-intensive report presentation production process sometimes threatened
product quality and deadlines. To continue to offer its members customized report
presentations—but do so efficiently, economically, and in a timely manner—the
Advisory Board needed a way to simplify and streamline the process of generating
presentations.
Solution
XML data-driven charting and presentation data for automated presentation
development.
Open XML Formats Training 2/20/2007 1-11
Called Study Builder, the solution benefits from enhancements in Microsoft® Office
2003—specifically, its ability to use XML to store and retrieve presentation data.
Developers took advantage of the Microsoft Office PowerPoint® 2003 presentation
graphics program for the presentation output; the switch from a word processing
program solves the formatting and pagination problems experienced previously. Data
can be maintained and charts can be created in Microsoft Office Excel 2003
spreadsheets. But the key to the solution lies between these core programs, in the
ability of Office 2003 to store and retrieve presentation data using XML. The solution
uses an entirely separate document—an XML document created using Excel 2003—
to manage the relationship between the Excel charts and the presentation output.
Users first publish charts from data in the source spreadsheet using a custom Excel
client built for this purpose. They then use a custom PowerPoint 2003 add-in. The
PowerPoint add-in provides the controls necessary to match an XML document with
a PowerPoint template file to produce a properly formatted PowerPoint presentation
using the published charts.
Now users can edit presentations much more easily by refreshing slides with the most
current charts. Because the XML document serves as the intermediary between the
data and the presentation medium, this process happens seamlessly and virtually
instantaneously.
Open XML Formats Training 2/20/2007 1-12
Benefits
Reduces the time required to create presentation templates by 33 percent
through automation of much of the presentation generation and update process
Helps ensure that deadlines are met through solid reliability
Creates opportunities by extending the company’s ability to produce
customized reports and presentations
2. Wortmann AG
Customer Profile
Wortmann AG is one of Germany’s foremost IT companies, manufacturing and
distributing computers and monitors through more than 5,500 retailers. It has 240
employees and 170 desktops.
Business Situation
Wortmann had a wealth of regional sales and marketing information in their Navision
system based on Microsoft Business Solutions. The company needed a tool that
would work with it to provide a visual representation of the data, improving
managers’ understanding of regional sales.
Solution
Extract geographic data from Navision and import it into Excel 2003.
Wortmann partnered with Bechtel to deploy a Microsoft Office System solution that
provides a flexible, automatic process to import data into Excel and allow users to
easily display data in Microsoft MapPoint Europe.
Open XML Formats Training 2/20/2007 1-13
Benefits
Reduced data extraction time
100% increase in top customer contacts
50% reduction in sales visit costs
Saved inventory costs
Improved sales network management
Word Custom Schema Examples
1. Northumberland College
Customer Profile
Northumberland College serves 14,000 students throughout northeastern England.
The college delivers community-oriented instruction in dozens of disciplines ranging
from automotive engineering and computer programming to healthcare, social work,
and graphic design. With Microsoft’s XML solution, the College cut costs and
solidified its position as a technology leader with XML-enabled desktops.
Business Situation
To strengthen its reputation as a technology innovator and to reduce costs, the college
needed to address administrative inefficiencies in the processing of information
gathering required for faculty self-assessment reports.
Solution
XML-based tool to automate the processing of self-assessment reports.
Working with Microsoft Consulting Services and Sx3 Infrastructure Services,
Northumberland IT executives upgraded the school’s desktop productivity software
to Microsoft® Office Professional Edition 2003 and built an XML-based tool to
automate the processing self-assessment reports.
Open XML Formats Training 2/20/2007 1-14
Benefits
Reduce time gathering data for sell-assessment reports from 2.5 days to 1 day
increasing productivity for task by 250%.
Improve data analysts in self-assessment reports.
Increase ability to attract new students.
2. CLE British Columbia
Customer Profile
Headquartered in Vancouver, the Continuing Legal Education Society of British
Columbia (CLE) provides reference materials and courses for 10,357 British
Columbia lawyers. The organization employs 35 people and has an IT department of
2.
Business Situation
Responding to market demand, CLE needed a cost-effective way to distribute its legal
reference materials online and provide rich, contextual navigation as a value-added
service to end users.
Solution
XML-based authoring and document-publishing system for book-publishing
Working with Vancouver’s Habañero Consulting Group, CLE is deploying the
Microsoft® Office System to implement an XML-based authoring and document-
publishing system for CLE’s book-publishing department. The solution includes an
existing nCompass content management system, with Microsoft Office Professional
Enterprise Edition 2003.
Open XML Formats Training 2/20/2007 1-15
Benefits
Increase CLE’s revenue an estimated 43 percent
Improve editor and production staff productivity by 7 percent
Provide online access to CLE content to broaden readership
Hyperlinks cross-reference case law and statutes
Keyword searches improving legal research from the desktop
Net Present Value per user of U.S. $30,238 and payback in 19 months
3. The Open University
Customer Profile
The Open University in the United Kingdom is one of the largest distance education
providers in the world with more than 200,000 students; 28,000 are based overseas.
Business Situation
Each year the Open University produces course content in print form and also for the
Web and other media. Microsoft® Word 2003 potentially offers authors greater
ability to structure the content in a format that is suitable for the Web, eliminating
extra work for themselves and designers alike.
Solution
Content Authoring Tool—create XML structured documents that can be easily
published via print and the Web. Content Authoring Tool enables authors to create
content on Microsoft® Office Word 2003, which support XML structured documents
that can be easily published on the Web.
Open XML Formats Training 2/20/2007 1-16
Benefits
Ease of use with standardised output.
Authors can work with XML in a familiar environment.
Fewer iterations mean faster, more efficient document production.
Time saved for designers as XML documents can be applied to the web
quickly.
4. McGraw-Hill Construction
Customer Profile
McGraw-Hill Construction is the largest information and intelligence provider to the
design and construction industry, serving more than one million customers. The
company provides project and product information, news, trends and forecasts in an
industry that exceeds U.S. 3 trillion globally.
Business Situation
The program McGraw Hill Construction used to deliver project information to its
subscribers used technology that was expensive for McGraw-Hill to maintain.
Subscribers received updates once a day—not often enough for the fast pace and
fiercely competitive nature of today’s construction industry.
Solution
An online service creates customized customer-defined views of construction
information and integrated data housed in previously isolated databases
Together with Xerox Global Services, McGraw-Hill developed features for the
McGraw-Hill Network solution, providing subscribers with anytime-access to up-to-
the-minute data, using familiar yet powerful Microsoft® Office System programs,
while at the same time fulfilling all its mandates to keep costs low and time-to-benefit
short.
Open XML Formats Training 2/20/2007 1-17
Benefits
Reduces operating costs
Increases competitive advantage by increasing customer retention and new
subscriber opportunities
Increases subscriber competitive advantage: reduces by up to 35 percent the
time subscribers spend accessing information
Use of the .NET Framework resulted in fast time to benefit; solution was
developed in only 6 week
Open XML Technical Training Final Draft 1.7 2/20/2007 1-18
5. Introducing the Open XML Formats
Office Open XML Formats are open, standardized file formats designed to provide interoperability,
transparency and compatibility for the billions of Microsoft Office documents that already exist, and those
that will be created in the future. The transition to the full XML-based environment begins with the 2007
Office System product line.
Microsoft Word, Excel and PowerPoint 2007 now use the XML-based file formats as their default formats and
the formats will be available to earlier products through a free update.
Taking advantage of an open royalty-free license, an extensible format, and compatibility with the most widely
used software, Open XML Formats enable the developer community and any technology provider to build fully
integrated, sustainable and interoperable solutions into their Office environment.
Office Open XML Formats Reference Schema
One of Microsoft’s current initiatives is the Office Open XML Formats Reference Schema (‘Schema’).
Microsoft’s Schema is an effort to establish an open and standard means of representing data.
Microsoft first published the Schema in November 2003. Since then, Microsoft has embarked upon a
standardization process for the Schema. In December 2005, Microsoft, together with nine other companies2,
submitted an upgraded version of the Schema to Ecma International for development as a standard. Formal
work on the standard has commenced, with an intention to complete the specification by the end of 2006.
The three main reference schemas include WordProcessingML, SpreadsheetML and the new PresentationML.
For example, the new PresentationML reference schema for PowerPoint enables PowerPoint files to be fully
described using XML. This opens a new world of possibilities for managing slide content and reusing slide
information. In addition to the new XML file formats, Excel 2007 includes the option to save large or complex
workbooks in a binary format.
Reference schemas will be discussed further in Module 2, Architecture and Module 4, Developer Solutions.
Standardization - Ecma International
In a marketplace of multiple competing products, standards exist to enable interoperability and help
customers achieve their goals of increased productivity and decreased costs.
Ecma International (‘Ecma’) is a non-profit, industry association of technology developers, vendors and users
that sets industry technology standards. Ecma submits its work for approval as ISO, IEC, ISO/IEC and ETSI
2 Apple, Barclays Capital, BP, the British Library, Essilor, Intel Corporation, NextPage Inc., Statoil ASA, Toshiba.
Open XML Formats Training 2/20/2007 1-19
standards, and is the main inventor and practitioner for “fast tracking” specifications through the standards
process.
Ecma has been involved in a variety of information and communication technology standards development
primarily related to consumer electronic and computer issues since 1961. Examples from Microsoft include
Ecma Script, CLI and C#.
Microsoft submitted its schema to Ecma International with a view to its eventual adoption as an Ecma
standard and possibly subsequent submission to the International Standards Organization (‘ISO’), for
consideration as an international standard.
The goal is to produce a formal standard for office productivity applications within the Ecma International
standards process, which is fully compatible with the Office Open XML Formats. The aim is to enable the
implementation of the Office Open XML Formats by a wide set of tools and platforms in order to foster
interoperability across office productivity applications and with other line-of-business systems.
Open XML Formats format specifications are published by Ecma to provide everybody with a non-proprietary
document format that is fully supported by and fully compatible with Microsoft Office..
At the latest meeting of the Ecma TC45 technical committee, the final draft of the Office Open XML
specification was approved as ready for submission to the Ecma General Assembly. The General Assembly will
review the spec and then vote on approval in December 2006, as the final step in making Open XML an official
Ecma standard. The Open XML Formats are available as a published standard from Ecma International or via
an open, royalty-free license from Microsoft
This is exciting news for Open XML developers! It means that the ongoing changes to the spec are finally done,
and you can write code around the latest version of the spec and be confident that your Open XML documents
will conform to the standard when it's approved. So download the final draft of the spec, and start getting
creative with Open XML.
Find it here: http://www.ecma-international.org/
Open XML Technical Training Final Draft 1.7 2/20/2007 1-20
6. Benefits
The Open XML file formats for Microsoft® Office Word, Microsoft® Office Excel, and Microsoft® Office
PowerPoint include several important benefits, including substantial file size reduction, improved data
recovery, and a greatly improved ability to integrate document contents into back-end systems and external
data sources.
Office Open XML Formats Benefit Highlights
Open and Royalty-Free – The specifications for the formats and schemas are under the
governance of Ecma International, and are protected under the Microsoft Open
Specification Promise. The Open XML Formats are available as a published standard
from Ecma International under an open, royalty-free perpetual license from Microsoft.
This ensures universal access to document formats, and removes restrictions for
developers and integrators seeking to implement Open XML Formats within their
solutions.
Interoperable – With industry standard XML at the core of the Office XML Formats,
exchanging data between Microsoft Office applications and enterprise business systems
is greatly simplified. Without requiring access to the Office applications, solutions can
alter information inside an Office document or create a document entirely from scratch
by using standard tools and technologies capable of manipulating XML. The new formats
enable you to build archives of documents without using Office code.
Improved, Robust Data Recovery – With more and more documents traveling as e-mail
attachments or removable storage, the chance of a network or storage failure increases
the possibility of document corruption. The Office XML Formats have been designed to
be more robust than the binary formats, thereby reducing the risk of lost information
due to damaged or corrupted files. The XML file formats improve data recovery by
segmenting and separately storing each part within the file package. Modular data
storage enable files to be opened even if a component within the file is damaged. This
data compartmentalization helps prevent the entire document from being lost and
potentially saves tremendous amounts of time and money spent recovering lost data.
Improved reliability means even documents created or altered outside of Office are less
likely to corrupt.
Efficient – The Office XML Formats use standard ZIP compression technology to store
documents. Because XML is a text–based format it compresses very well. The
combination of XML and ZIP technologies makes files universally accessible and offers
up to 50 percent smaller file sizes than comparable binary documents. This proffers a
Open XML Formats Training 2/20/2007 1-21
potential cost savings because it reduces the disk space required to store files and
decreases the bandwidth needed to transport files by way of e-mail, over networks, and
across the Web.
Improved security – The openness of the Office XML Formats translates to more secure
and transparent files. Documents can be shared confidently because personally
identifiable information and business sensitive information, such as user names,
comments and file paths, can be easily identified and removed. The file formats also
help to improve security against documents with embedded code or macros because
the new file formats do not execute embedded code. Therefore, an e-mail message with
a Word document attached can be safely opened knowing the document does not
execute harmful code. For files that contain embedded code, scripts or macros the
Office XML Formats include a special-purpose format with a different extension that
enables IT staff to quickly identify files that contain code. At the most extreme level of
security IT managers can easily bar files with the macro-enabled extension from their
networks entirely.
Backward-compatible – The 2007 Microsoft Office System is backward-compatible with
Office 2000, Office XP, and Office 2003. Users of these versions can adopt the new
format with little effort and continue to gain maximum benefit from existing files.
Specifically, older .doc, .xls, and .ppt binary formats can still be used and are fully
compatible with the 2007 file format. Free updates can be downloaded to enable the
older versions to open and edit files in the new format. Conversely, users who install the
2007 Office release can set the default file formats to use either the new or the older
extensions. This ensures that users can continue to work with third-party solutions
based on earlier versions, and simultaneously work with colleagues, suppliers,
customers, and others who have upgraded to the 2007 release.
Performance optimization – The transition to open document formats won’t slow down
users. Users, organizations, and developers that take advantage of the default Microsoft
Office Open XML Formats can unlock the possibilities for many new solution types and
scenarios that developers can build, without sacrificing performance.
Software Accessibility – To enable advanced support from screen readers, anyone can
write an application that accesses and/or manipulates the Open XML format files.
Organizations that combine existing business system investments with the Microsoft
2007 Office System platform, and the new XML-based file format will also benefit
because documents can be accessed as sources of data, manipulated without the Office
applications, and processed in existing enterprise solutions.
Open XML Formats Training 2/20/2007 1-22
Compatibility Overview
Users will find that addressing compatibility is a straightforward process and that existing Office documents
will work seamlessly with the 2007 Microsoft Office system. To ensure the Open XML Formats become an
important part of all Office environments, Microsoft has taken extensive measures to enable compatibility for
current Office users so they can easily exchange XML documents with older versions and integrate those
documents into other applications and systems.
To enable previous versions to read and write the new file formats, free updates can be applied to existing
installations. The patches enable older versions of the Office and the Windows shell to recognize the new file
name extensions.
Bulk conversion tools will also be available. Systems administrators can select the default file type and default
compatibility mode. Defaults can be set during installation or included in policies applied to specific users or
specific roles. For example, organizations undertaking staged upgrades or staged rollouts might want to set
Office 2003 binary as the default ‘Save’ option until all desktops have been upgraded.
This innovation breaks down a substantial barrier to sharing documents. When Office 2000, XP, or 2003
attempts to open a 2007 Microsoft Office system document, it will be able to do so freely; no “Save As”
operations or complicated workarounds are required. Requests such as “I can’t open your Office Word
attachment. Please save it using (my release’s) Office Word format” is a thing of the past.
Module 3, Compatibility, contains more information on conversion and compatibility issues.
Open XML Technical Training Final Draft 1.7 2/20/2007 1-23
7. Evolving Customer Scenarios for the Open XML Formats - Examples
Microsoft has an evolving partnership with their Office customers to enable new kinds of innovation with the
Office System. Office users place increasing demands on the software, which drives innovation from
Microsoft. Support for XML in the 2007 Office System enables new, important scenarios that are driving
customer needs today. As Microsoft continues to improve the Office platform, this enables the advancement
of these scenarios for our customers. This helps our customers – and Microsoft – move forward.
Document Assembly
Document Assembly is an important scenario for organizations that construct documents from content that
already exists. Rather than forcing users to re-create the same content repeatedly, XML can be used to aid in
the migration of content between documents. This enables a “building blocks” approach to document
creation, and represents a huge time savings.
For example, suppose two companies merge. Thousands of documents in each company will need to have the
company name, logo, and other information changed. Using older technology requires opening each
document in the application that created it, making the changes, and closing the file. Some IT departments
have written scripts to accomplish this, but these are very inefficient. It is far more efficient to simply search
through existing XML files and do the replacement using software written in a high-level language. One
program can do the entire job in a completely automated way.
Integration and Content Reuse
Office XML Formats enable rapid creation of documents from disparate data sources, accelerating document
assembly, data mining, and content reuse. Exchanging data between Microsoft Office applications and
enterprise business systems is simplified—IT can alter information inside a Microsoft Office document or
create a document from scratch using standard tools and technologies; access to Microsoft Office applications
is not required. Productivity is improved by publishing, searching, and reusing information more quickly and
accurately in the environment users choose.
When content is published in multiple locations, the ability to reuse content is critically important. This
enables businesses to work from a single source of business information, for example the financial data stored
in a sales tracking system. Instead of having many users copy and paste the data, they can use the back-end
system as a data source to populate a template area of a document. This goes a long way to ensure accuracy
and data integrity throughout an enterprise. By taking advantage of the built-in collaboration features of
Office 2007 each user can be assured they are working with the latest version of the information when
assembling their specific document.
Document Sanitization
The increased awareness of compliance and information privacy is placing new demands on software to
protect sensitive information. The ability to detect and remove comments, document versions, personally
identifiable information ensures that sensitive data is not leaking outside the organization. This is important
especially for client-facing communications. In the 2007 release of Office document sanitization is simply a
matter of selecting File/Finish/Inspect Document. The Document Inspector examines the file and reports on
Open XML Formats Training 2/20/2007 1-24
whether it contains comments, revisions, annotations, personal information, and many other potentially
sensitive items. Users can choose to remove some or all of the categories in the document.
Document Interrogation
The ability to reuse content and to maximize the value of this portable data is predicated on the ability to FIND
it. The support for custom schema in Office enables users to tag data in a way that is meaningful to them, so if
they ever did need to reuse or republish that content, they can quickly search for these tags, like company
name, for example, to get to the content they have stored on their system.
Content Tagging
By adding a tagging schema to content, organizations can dramatically improve their content searches, as well
as improve the value of the information stored in documents. Even with all the emphasis on search
technology, the lack of a tagging taxonomy that is relevant to your business can prevent you from having the
most efficient search possible, which reduces employee productivity. Word, Excel and PowerPoint support
“smart tags.” Organizations can create their own smart tags then use them as the basis for searches.
Document Archival
XML-based document archives include the data and presentation information ensuring document formats can
be accessed and consumed long into the future without vendor-specific clients or applications.
Module 4, Developer Solutions, further explores these scenarios.
Open XML Technical Training Final Draft 1.7 2/20/2007 1-25
8. Executive Summary
To facilitate the interoperability of documents, and enable the exchange of documents across systems and
applications, the 2007 Microsoft® Office system introduces the new default XML file formats for Microsoft
Word text processing, Excel® spreadsheet, and PowerPoint® presentation graphics programs. These new
Office Open XML formats change the way developers approach solutions based on Office documents.
The Role of XML in Office
Interoperability by design – the 2007 Office System is designed to enable interoperability of documents and
information between users, programs, systems and applications.
The 2007 Microsoft® Office system is designed to
achieve industry alignment using standardized technologies.
enable data interoperability between documents, applications and systems.
capture and reuse information to and from many data sources.
build intelligent applications that improve data context and quality.
Figure 2: The Role Of XML in Office 2007
Integration into existing enterprise architectures
With these new formats, Microsoft ensures that organizations can successfully and completely integrate the
2007 Microsoft® Office system into existing enterprise architectures. This change represents a large step
Open XML Formats Training 2/20/2007 1-26
forward in extending Microsoft’s commitment to industry-standard integration technologies, to XML, and to
open, published file format specifications.
The file format is a compact, robust format that offers smaller file sizes, and improved data recovery. Because
the contents of this new file format are segmented and stored inside the file by data type, the ability for
developers to access, query, modify or repair file contents improves tremendously.
The new XML-based file formats are the default for Word, Excel and PowerPoint. This means that when these
programs are installed and a new document is created, it will automatically be saved using the new XML
format and file extensions. To ensure that users of prior Office releases can open, edit and save these
documents these new file formats are backward compatible to Office 2000. Microsoft has introduced new file
extensions for the Office applications, including new extensions for templates, macros, add-ins and other
formats.
In brief, the new default extensions for the three main Office applications include:
Word – .docx
Excel – .xlsx
PowerPoint - .pptx
Module 2, Architecture, contains more detailed information on file formats, extensions and structure.
For More Information
www.microsoft.com/office/preview
www.OpenXMLDeveloper.org
http://www.ecma-international.org/
www,Blogs.msdn.com/brian_jones
www.msdn.microsoft.com/office/xml
www.microsoft.com/technet/prodtechnol/office
www.microsoft.com/resources/casestudies
Open XML Formats Training 2/20/2007 1-27
Table of Contents – Module 1
Goal & Objectives ............................................................................................................................... 1
Key Concepts ....................................................................................................................................... 1
XML ........................................................................................................................................... 1
XML Schema ............................................................................................................................ 1
Ecma International ................................................................................................................. 1
Ecma Office Open XML Formats ........................................................................................... 2
1. The Evolution of Document Authoring ................................................................................... 3
2. How XML Enables the Future of Document Authoring ........................................................ 4
A Brief History of XML ............................................................................................................ 4
The Role of XML File Formats ............................................................................................... 5
Openness & Transparency .................................................................................................... 5
Interoperability ....................................................................................................................... 6
Transitioning Office Users to XML-based Formats ............................................................ 6
3. The Path to Interoperability: How Office Currently Supports XML ....................................... 7
4. Customer Use Case Examples – Office 2003 ......................................................................... 9
Excel Custom Schema Examples ......................................................................................... 10
Word Custom Schema Examples ........................................................................................ 13
5. Introducing the Open XML Formats ...................................................................................... 18
Office Open XML Formats Reference Schema ................................................................. 18
Standardization - Ecma International ................................................................................ 18
6. Benefits .......................................................................................................................................... 20
Office Open XML Formats Benefit Highlights ................................................................... 20
Compatibility Overview ....................................................................................................... 22
Open XML Formats Training 2/20/2007 1-28
7. Evolving Customer Scenarios for the Open XML Formats - Examples.............................. 23
Document Assembly ............................................................................................................ 23
Integration and Content Reuse .......................................................................................... 23
Document Sanitization......................................................................................................... 23
Document Interrogation ...................................................................................................... 24
Content Tagging .................................................................................................................... 24
Document Archival ............................................................................................................... 24
8. Executive Summary ................................................................................................................. 25
The Role of XML in Office .................................................................................................... 25
Integration into existing enterprise architectures ........................................................... 25
For More Information ...................................................................................................................... 26