28
Open XML Technical Training Final Draft 1.7 2/20/2007 1-1 Office Open XML Formats Fundamentals 1 This module provides an introduction to the Ecma Office Open XML Formats. It includes an overview of Open XML Formats and discusses how they have paved the way for more sophisticated and powerful uses for Microsoft Office products. Goal & Objectives The goal of this module is to familiarize you with the features, and benefits of the 2007 Microsoft Office system with Open XML Formats in the 2007 Microsoft® Office system. After completing this module, you will be able to: Discuss the background and role of the Office Open XML File Formats in the 2007 Microsoft® Office system. Understand the importance and benefits of XML. Understand the role of standards organizations. Key Concepts XML Extensible Markup Language (XML) - a simple, flexible text file format designed for electronic publishing and the exchange of a wide variety of data on the Internet and elsewhere. XML Schema An XML reference schema provides a blueprint for the data that can be stored in an Open XML file. For example, an XML schema for library books defines data elements such as title, author, and catalog number. An XML schema for real estate listings defines data elements such as address, square footage, and price. Ecma International Ecma International is a standards organization that is the approving authority for Microsoft’s Open XML Formats standards.

Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

  • Upload
    others

  • View
    12

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-1

Office Open XML Formats Fundamentals

1

This module provides an introduction to the Ecma Office Open XML Formats. It includes an

overview of Open XML Formats and discusses how they have paved the way for more

sophisticated and powerful uses for Microsoft Office products.

Goal & Objectives

The goal of this module is to familiarize you with the features, and benefits of the 2007 Microsoft Office

system with Open XML Formats in the 2007 Microsoft® Office system.

After completing this module, you will be able to:

Discuss the background and role of the Office Open XML File Formats in the 2007

Microsoft® Office system.

Understand the importance and benefits of XML.

Understand the role of standards organizations.

Key Concepts

XML

Extensible Markup Language (XML) - a simple, flexible text file format designed for electronic publishing and

the exchange of a wide variety of data on the Internet and elsewhere.

XML Schema

An XML reference schema provides a blueprint for the data that can be stored in an Open XML file. For

example, an XML schema for library books defines data elements such as title, author, and catalog number. An

XML schema for real estate listings defines data elements such as address, square footage, and price.

Ecma International

Ecma International is a standards organization that is the approving authority for Microsoft’s Open XML

Formats standards.

Page 2: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-2

Ecma Office Open XML Formats

The Ecma Office Open XML Formats are standard file formats that describe a family of XML schemas,

collectively called Office Open XML, which define the XML vocabularies for word-processing, spreadsheet, and

presentation documents, as well as the packaging of documents that conform to these schemas. The goal is to

enable the implementation of the Office Open XML formats by the widest set of tools and platforms, fostering

interoperability across office productivity applications and line-of-business systems, as well as to support and

strengthen document archival and preservation.Flex

Find it here: http://www.ecma-international.org

Page 3: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-3

1. The Evolution of Document Authoring

As technology advances, the way people work advances as well. Moving beyond the point where paper

documents can serve as the “document of record,” the digital work style requires smarter applications capable

of producing self-aware electronic masters that survive over extended periods of time, can be shared among

many people using diverse computing platforms and know what they are and where they need to go. The

digital work style also demands smarter documents that go beyond simple fixed files and can enable advanced

productivity scenarios.

At the time of the release of Microsoft® Office ’97, new binary file formats were actually introduced for some

Office applications. The file format design addressed the core needs of the computing environment of the

time – compact, binary file formats that could be easily exchanged on floppy disks or as e-mail attachments,

and were optimized for editing and printing by individual users.

A primary function of Word 97 was to produce electronic documents using a familiar tool. The electronic

version became a temporary state during the document authoring process. Defined as “complete” only once

the final version was printed, the distributed, printed document became the master version. This paper-

focused output reinforced the linear, static aspect of documents.

Technology has changed our workplace substantially since 1997. Now viewed as temporary and disposable,

paper copies are mainly used to facilitate tasks such as heavy reading, annotation, and other activities for

which the paper interface is better suited. The master version is now more likely to be the electronic

document rather than the paper version.

Electronic document formats have enabled a host of automatic, machine-driven processes. With the maturity

of online document libraries, search, backup, and security capabilities, the value in keeping the document in

electronic form has significantly increased. Collaboration that used to take place face-to-face now takes place

electronically. Activities such as records management, project management, scheduling, and tracking are now

handled electronically.

The way people create documents is also undergoing an important transformation. Instead of building

documents by cutting and pasting from different data sources, and cobbling together information from old

documents, “live” connections to external data sources are used to build documents. That means any

document can be updated in real time. It also means everyone is always working with the same up-to-date

information, removing the danger of having two different sets of data included in a single document.

Organizations deploy templates that guide authors through highly regulated or structured document creation

processes to ensure the most recent and relevant data is included in documents. The ability to connect to this

external data is important to help automate the document authoring process.

The problem, however, is building templates that easily connect to these external data sources. Currently,

solution providers have limited access to binary file format specifications, do not have easy access to the

contents, and are forced to rely on integration with the applications that author the documents. The opaque

nature of a binary file format, together with potential file corruption issues and application version conflicts,

prevents solution providers from quickly accessing and interacting with the contents.

Page 4: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-4

The ability to share documents and integrate external data requires that applications and file formats support

broadly accessible, standardized integration standards, such as XML and Web services. These standardized

integration technologies enable developers to build powerful solutions without having to decode proprietary

applications or file formats. Standards-based integration technology frees solution providers from focusing on

the application that creates the information, and enables them to focus more on the solution they need to

build.

The original XML-based format options with Office 2003 sparked a new level of interest in XML document

solutions from hundreds of thousands of developers. To help broaden access to the digital work style, the

2007 Microsoft® Office system release introduces Open XML Formats that are specifically designed to enable

interoperability with other systems and programs, and has significantly improved the ability to create and

share smarter documents.

2. How XML Enables the Future of Document Authoring

A Brief History of XML

XML has been used for about ten years and an XML file is just plain text. It is, by definition, an eXtensible

Markup Language. Since it is extensible, XML and the languages defined using XML, will continue to evolve for

as long as it remains in use.

There are specialized versions of XML used for many different purposes. For example, SportsML is a markup

language that enables sportswriters and statisticians to exchange information about events in a uniform

language. Docbook XML provides a system for writing structured documents using SGML or XML. (SGML

stands for Standard Generalized Markup Language, a predecessor to XML that defined structure for content.).

Docbook XML is slowly replacing the older, once widespread SGML standard as a specific XML used for

publishing long, often technical, documents. All XML standards have their roots in the SGML standard. The

ISO approved its first SGML standard in 1986.

Here’s a short sample of some Open XML from a spreadsheet file:

<sheetData>

<row r="1" spans="1:3">

<c r="A1"><v>1</v></c>

<c r="B1"><v>2</v></c>

<c r="C1"><v>3</v></c>

</row>

</sheetData>

The tags r and c stand for row and column, while v tells the application the next character(s) will be the

corresponding value.

Page 5: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-5

The Role of XML File Formats

For a number of years, IT professionals have strived to reduce operational costs by achieving greater

interoperability between programs and streamlining processes that involve information exchange. Many have

expressed a desire to move away from proprietary systems and formats towards ‘open’ standards that enable

true interoperability and information exchange.

To that end, a new level of transparency and openness is being established to ensure that organizations are

not locked into a single vendor or platform, and can easily transfer documents and share information across

applications and systems.

To facilitate the openness and transparency required to participate in the shifting work paradigm, and to

transition away from the locked-in approach to system design, new ‘open’ standards and open file formats

have been established for the various software applications used for word processing, spreadsheet generation

and the creation of presentations.

Microsoft first began transitioning to these open file formats with Office 2000. Specifically, Office 2000

introduced SpreadsheetML, a markup language for representing Microsoft Excel workbooks in XML format.

The XML investment continued in Office 2003 with the introduction of WordProcessingML, an XML-based file

format for Microsoft Word documents. The 2007 Microsoft® Office system continues this evolution by

improving the XML capabilities of SpreadsheetML, and WordProcessingML, and introducing the new

PresentationML.

Openness & Transparency

The Open XML Formats offer greater transparency and openness than were previously possible with binary file

formats. Based on XML, these formats are readily open and are designed with long-term robustness and

accessibility in mind. Organizations can now have greater access to document contents without being

dependent on any particular software application or software vendor.

The Ecma Office Open XML Formats are published as an Ecma International standard. Open file format

licenses ensure that any technology provider can freely incorporate Open XML Formats into their technologies

without financial or other consideration to Microsoft. The published specifications and access to the full

documentation means that anyone can quickly learn how to integrate Office files into their solutions.1

1 Refer to Ecma http://www.ecma-international.org/news/TC45_current_work/TC45-2006-

50_draft14.htm.

Page 6: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-6

Interoperability

Much like the Office system itself, the Open XML Formats are designed to provide interoperability. The open

and transparent nature of the Open XML Formats ensures solution developers can integrate document

contents into solutions that automate business processes, facilitate document assembly and enable a host of

other advanced scenarios.

Open XML Formats offer a way to achieve industry alignment using standardized technologies and enable

complete data interoperability between documents, applications and systems. Solution developers can now

build intelligent applications that improve data context and quality, and allow information to be captured and

reused between many data sources.

Going beyond the fundamentals of XML-based document descriptions, the Open XML Formats enable

organizations to use their own XML vocabularies to capture information within documents. This separates the

information in the document from the presentation; making the data highly portable to other applications and

systems.

Transitioning Office Users to XML-based Formats

Customers tell us they want solutions that preserve interoperability while simultaneously expanding consumer

choices. The transition to the new Office XML formats, however, raises some important questions: Will these

formats be backward compatible? Can I use these new file formats with prior releases of Office? Will I be able

to exchange documents between 2007 Microsoft Office System and other releases of Office?

In the case of the Open XML Formats, the answer is ‘yes’ to all these questions. With the standardization of

Office Open XML file formats, customers can access their own content for generations to come. Customers

get a greater ability to manage the content of their documents, to use, reuse, and track the content via a

standard XML specification even with the many new tools that will spring up. Partners and competitors alike

will find many innovative opportunities to meet these needs.

Page 7: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-7

3. The Path to Interoperability: How Office Currently Supports XML

Office is designed for interoperability and XML integration in Office opens many new doors for information

exchange and collaboration.

Microsoft first introduced XML support to Office by pioneering XML-based document properties in Office

2000, enabling organizations to easily extract metadata from documents.

Office XP introduced SpreadsheetML as the first Office XML-based file format, enabling Excel users to store

data in an XML-based format, and enabling direct access to spreadsheet contents without using Excel. This

was a boon to developers who sought to access spreadsheet data to offer it to analysis programs and

reporting systems.

In Office 2003, Microsoft continued to make large investments in XML. WordProcessingML was introduced, as

well as support for custom-defined schemas, enabling XML-based content tagging using a taxonomy that

makes sense to individual organizations.

Other XML investment areas in Office 2003 were Reference Schemas for Word, Excel and InfoPath file formats

that were made available under an open license to help developers integrate the Office file formats more

easily.

Today, Microsoft continues to take important steps to enable the openness, transparency and interoperability

required for the future of document authoring. Continued XML investments in Office enable people to fully

utilize the power of the information stored throughout the organization; they are no longer limited by product

functionality when it comes to sharing information. Users can collect, share and publish information from

numerous data sources and eliminate many of the time-consuming, error-prone tasks associated with

information gathering and document creation.

By connecting users to external data, XML support in the Office system allows communication between

business systems and data sources, and between systems that are written in different languages on different

platforms. XML extensibility tools in Office can be used to deliver smart clients based on Office technologies

that take full advantage of Web services by accessing the information directly and dynamically.

XML support in Office enables solutions that recognize the structure and meaning of the content within

documents and respond intelligently to the user. Application intelligence can be used to validate information

or data as it is input, avoiding errors and aiding in data cleansing and normalization.

The breadth of XML support within the Microsoft Office system facilitates the modern work environment,

where copy / paste, manual editing and continuous entry of the same information is a thing of the past.

In addition to document formats, Office supports many XML-based data exchange methods. Word, Excel and

InfoPath all support the use of XML Web services to facilitate connections to external content.

Word Smart Document Solutions enable organizations to deploy sophisticated document templates that

combine external data sources and programming to guide users through the process of authoring highly

structured or complex documents.

Page 8: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-8

Figure 1: Path to Interoperability

Page 9: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-9

4. Customer Use Case Examples – Office 2003

The following table shows a few examples of how XML has been used in real-world situations. Several of these

examples are highlighted below where excerpts show the customer solutions using the 2003 Excel and Word

XML Schemas.

Find it here: www.microsoft.com/resources/casestudies.

Customer Product Solution

Advisory Board Company, The Excel custom-defined

schema

XML data-driven charting and

presentation data for automated

presentation development.

Northumberland College Word custom-defined

schema

XML-based tool to automate the

processing of self-assessment reports.

Siemens Word custom-defined

schema

Data collaboration system

Gol Airlines Excel custom-defined

schema

Extract data from Open Skies® to present

flight timeline information schematically

in Excel 2003.

CLE British Columbia Word custom-defined

schema

XML-based authoring and document-

publishing system for book-publishing

Wortmann AG Excel custom-defined

schema

Extract geographic data from Navision

and import it into Excel 2003.

Open University, The Word custom-defined

schema

Content Authoring Tool—create XML

structured documents that can be easily

published via print and the Web.

McGraw-Hill Construction Word custom-defined

schema

An online service creates customized

customer-defined views of construction

information and integrated data housed

in previously isolated databases.

Continental Airlines Word custom-defined Solution puts Advisories into the hands of

maintenance personnel more quickly,

Page 10: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-10

schema improving the bottom line and favorable

audit results

PGGM Word + InfoPath

custom-defined

schema

Document info system automates doc

handling, updates data, and archives

electronically. XML Web services provide

integration between desktop and server

Danish InfoStructure Base XML Reference

Schemas

Open publishing format for documents

endorsed by government

CambridgeDoc XML Reference

Schemas

Open publishing format

Excel Custom Schema Examples

1. The Advisory Board Company

Customer Profile

The Advisory Board Company–an organization that provides best-practice research

and analysis to some of the largest and most progressive health systems and medical

centers in the United States–found it difficult and expensive to provide clients with

customized reports using standard desktop productivity software. To address this

problem, they turned to Microsoft and Dell Professional Services to develop a

comprehensive, automated presentation generation solution based on the Microsoft®

Office System. After rapid and easy development of its prototype, the Advisory

Board now has a simple, straightforward process by which it has been able to

simplify report production, lower costs, and provide its Members more valuable

information tailored to customer needs.

Business Situation

A labor-intensive report presentation production process sometimes threatened

product quality and deadlines. To continue to offer its members customized report

presentations—but do so efficiently, economically, and in a timely manner—the

Advisory Board needed a way to simplify and streamline the process of generating

presentations.

Solution

XML data-driven charting and presentation data for automated presentation

development.

Page 11: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-11

Called Study Builder, the solution benefits from enhancements in Microsoft® Office

2003—specifically, its ability to use XML to store and retrieve presentation data.

Developers took advantage of the Microsoft Office PowerPoint® 2003 presentation

graphics program for the presentation output; the switch from a word processing

program solves the formatting and pagination problems experienced previously. Data

can be maintained and charts can be created in Microsoft Office Excel 2003

spreadsheets. But the key to the solution lies between these core programs, in the

ability of Office 2003 to store and retrieve presentation data using XML. The solution

uses an entirely separate document—an XML document created using Excel 2003—

to manage the relationship between the Excel charts and the presentation output.

Users first publish charts from data in the source spreadsheet using a custom Excel

client built for this purpose. They then use a custom PowerPoint 2003 add-in. The

PowerPoint add-in provides the controls necessary to match an XML document with

a PowerPoint template file to produce a properly formatted PowerPoint presentation

using the published charts.

Now users can edit presentations much more easily by refreshing slides with the most

current charts. Because the XML document serves as the intermediary between the

data and the presentation medium, this process happens seamlessly and virtually

instantaneously.

Page 12: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-12

Benefits

Reduces the time required to create presentation templates by 33 percent

through automation of much of the presentation generation and update process

Helps ensure that deadlines are met through solid reliability

Creates opportunities by extending the company’s ability to produce

customized reports and presentations

2. Wortmann AG

Customer Profile

Wortmann AG is one of Germany’s foremost IT companies, manufacturing and

distributing computers and monitors through more than 5,500 retailers. It has 240

employees and 170 desktops.

Business Situation

Wortmann had a wealth of regional sales and marketing information in their Navision

system based on Microsoft Business Solutions. The company needed a tool that

would work with it to provide a visual representation of the data, improving

managers’ understanding of regional sales.

Solution

Extract geographic data from Navision and import it into Excel 2003.

Wortmann partnered with Bechtel to deploy a Microsoft Office System solution that

provides a flexible, automatic process to import data into Excel and allow users to

easily display data in Microsoft MapPoint Europe.

Page 13: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-13

Benefits

Reduced data extraction time

100% increase in top customer contacts

50% reduction in sales visit costs

Saved inventory costs

Improved sales network management

Word Custom Schema Examples

1. Northumberland College

Customer Profile

Northumberland College serves 14,000 students throughout northeastern England.

The college delivers community-oriented instruction in dozens of disciplines ranging

from automotive engineering and computer programming to healthcare, social work,

and graphic design. With Microsoft’s XML solution, the College cut costs and

solidified its position as a technology leader with XML-enabled desktops.

Business Situation

To strengthen its reputation as a technology innovator and to reduce costs, the college

needed to address administrative inefficiencies in the processing of information

gathering required for faculty self-assessment reports.

Solution

XML-based tool to automate the processing of self-assessment reports.

Working with Microsoft Consulting Services and Sx3 Infrastructure Services,

Northumberland IT executives upgraded the school’s desktop productivity software

to Microsoft® Office Professional Edition 2003 and built an XML-based tool to

automate the processing self-assessment reports.

Page 14: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-14

Benefits

Reduce time gathering data for sell-assessment reports from 2.5 days to 1 day

increasing productivity for task by 250%.

Improve data analysts in self-assessment reports.

Increase ability to attract new students.

2. CLE British Columbia

Customer Profile

Headquartered in Vancouver, the Continuing Legal Education Society of British

Columbia (CLE) provides reference materials and courses for 10,357 British

Columbia lawyers. The organization employs 35 people and has an IT department of

2.

Business Situation

Responding to market demand, CLE needed a cost-effective way to distribute its legal

reference materials online and provide rich, contextual navigation as a value-added

service to end users.

Solution

XML-based authoring and document-publishing system for book-publishing

Working with Vancouver’s Habañero Consulting Group, CLE is deploying the

Microsoft® Office System to implement an XML-based authoring and document-

publishing system for CLE’s book-publishing department. The solution includes an

existing nCompass content management system, with Microsoft Office Professional

Enterprise Edition 2003.

Page 15: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-15

Benefits

Increase CLE’s revenue an estimated 43 percent

Improve editor and production staff productivity by 7 percent

Provide online access to CLE content to broaden readership

Hyperlinks cross-reference case law and statutes

Keyword searches improving legal research from the desktop

Net Present Value per user of U.S. $30,238 and payback in 19 months

3. The Open University

Customer Profile

The Open University in the United Kingdom is one of the largest distance education

providers in the world with more than 200,000 students; 28,000 are based overseas.

Business Situation

Each year the Open University produces course content in print form and also for the

Web and other media. Microsoft® Word 2003 potentially offers authors greater

ability to structure the content in a format that is suitable for the Web, eliminating

extra work for themselves and designers alike.

Solution

Content Authoring Tool—create XML structured documents that can be easily

published via print and the Web. Content Authoring Tool enables authors to create

content on Microsoft® Office Word 2003, which support XML structured documents

that can be easily published on the Web.

Page 16: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-16

Benefits

Ease of use with standardised output.

Authors can work with XML in a familiar environment.

Fewer iterations mean faster, more efficient document production.

Time saved for designers as XML documents can be applied to the web

quickly.

4. McGraw-Hill Construction

Customer Profile

McGraw-Hill Construction is the largest information and intelligence provider to the

design and construction industry, serving more than one million customers. The

company provides project and product information, news, trends and forecasts in an

industry that exceeds U.S. 3 trillion globally.

Business Situation

The program McGraw Hill Construction used to deliver project information to its

subscribers used technology that was expensive for McGraw-Hill to maintain.

Subscribers received updates once a day—not often enough for the fast pace and

fiercely competitive nature of today’s construction industry.

Solution

An online service creates customized customer-defined views of construction

information and integrated data housed in previously isolated databases

Together with Xerox Global Services, McGraw-Hill developed features for the

McGraw-Hill Network solution, providing subscribers with anytime-access to up-to-

the-minute data, using familiar yet powerful Microsoft® Office System programs,

while at the same time fulfilling all its mandates to keep costs low and time-to-benefit

short.

Page 17: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-17

Benefits

Reduces operating costs

Increases competitive advantage by increasing customer retention and new

subscriber opportunities

Increases subscriber competitive advantage: reduces by up to 35 percent the

time subscribers spend accessing information

Use of the .NET Framework resulted in fast time to benefit; solution was

developed in only 6 week

Page 18: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-18

5. Introducing the Open XML Formats

Office Open XML Formats are open, standardized file formats designed to provide interoperability,

transparency and compatibility for the billions of Microsoft Office documents that already exist, and those

that will be created in the future. The transition to the full XML-based environment begins with the 2007

Office System product line.

Microsoft Word, Excel and PowerPoint 2007 now use the XML-based file formats as their default formats and

the formats will be available to earlier products through a free update.

Taking advantage of an open royalty-free license, an extensible format, and compatibility with the most widely

used software, Open XML Formats enable the developer community and any technology provider to build fully

integrated, sustainable and interoperable solutions into their Office environment.

Office Open XML Formats Reference Schema

One of Microsoft’s current initiatives is the Office Open XML Formats Reference Schema (‘Schema’).

Microsoft’s Schema is an effort to establish an open and standard means of representing data.

Microsoft first published the Schema in November 2003. Since then, Microsoft has embarked upon a

standardization process for the Schema. In December 2005, Microsoft, together with nine other companies2,

submitted an upgraded version of the Schema to Ecma International for development as a standard. Formal

work on the standard has commenced, with an intention to complete the specification by the end of 2006.

The three main reference schemas include WordProcessingML, SpreadsheetML and the new PresentationML.

For example, the new PresentationML reference schema for PowerPoint enables PowerPoint files to be fully

described using XML. This opens a new world of possibilities for managing slide content and reusing slide

information. In addition to the new XML file formats, Excel 2007 includes the option to save large or complex

workbooks in a binary format.

Reference schemas will be discussed further in Module 2, Architecture and Module 4, Developer Solutions.

Standardization - Ecma International

In a marketplace of multiple competing products, standards exist to enable interoperability and help

customers achieve their goals of increased productivity and decreased costs.

Ecma International (‘Ecma’) is a non-profit, industry association of technology developers, vendors and users

that sets industry technology standards. Ecma submits its work for approval as ISO, IEC, ISO/IEC and ETSI

2 Apple, Barclays Capital, BP, the British Library, Essilor, Intel Corporation, NextPage Inc., Statoil ASA, Toshiba.

Page 19: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-19

standards, and is the main inventor and practitioner for “fast tracking” specifications through the standards

process.

Ecma has been involved in a variety of information and communication technology standards development

primarily related to consumer electronic and computer issues since 1961. Examples from Microsoft include

Ecma Script, CLI and C#.

Microsoft submitted its schema to Ecma International with a view to its eventual adoption as an Ecma

standard and possibly subsequent submission to the International Standards Organization (‘ISO’), for

consideration as an international standard.

The goal is to produce a formal standard for office productivity applications within the Ecma International

standards process, which is fully compatible with the Office Open XML Formats. The aim is to enable the

implementation of the Office Open XML Formats by a wide set of tools and platforms in order to foster

interoperability across office productivity applications and with other line-of-business systems.

Open XML Formats format specifications are published by Ecma to provide everybody with a non-proprietary

document format that is fully supported by and fully compatible with Microsoft Office..

At the latest meeting of the Ecma TC45 technical committee, the final draft of the Office Open XML

specification was approved as ready for submission to the Ecma General Assembly. The General Assembly will

review the spec and then vote on approval in December 2006, as the final step in making Open XML an official

Ecma standard. The Open XML Formats are available as a published standard from Ecma International or via

an open, royalty-free license from Microsoft

This is exciting news for Open XML developers! It means that the ongoing changes to the spec are finally done,

and you can write code around the latest version of the spec and be confident that your Open XML documents

will conform to the standard when it's approved. So download the final draft of the spec, and start getting

creative with Open XML.

Find it here: http://www.ecma-international.org/

Page 20: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-20

6. Benefits

The Open XML file formats for Microsoft® Office Word, Microsoft® Office Excel, and Microsoft® Office

PowerPoint include several important benefits, including substantial file size reduction, improved data

recovery, and a greatly improved ability to integrate document contents into back-end systems and external

data sources.

Office Open XML Formats Benefit Highlights

Open and Royalty-Free – The specifications for the formats and schemas are under the

governance of Ecma International, and are protected under the Microsoft Open

Specification Promise. The Open XML Formats are available as a published standard

from Ecma International under an open, royalty-free perpetual license from Microsoft.

This ensures universal access to document formats, and removes restrictions for

developers and integrators seeking to implement Open XML Formats within their

solutions.

Interoperable – With industry standard XML at the core of the Office XML Formats,

exchanging data between Microsoft Office applications and enterprise business systems

is greatly simplified. Without requiring access to the Office applications, solutions can

alter information inside an Office document or create a document entirely from scratch

by using standard tools and technologies capable of manipulating XML. The new formats

enable you to build archives of documents without using Office code.

Improved, Robust Data Recovery – With more and more documents traveling as e-mail

attachments or removable storage, the chance of a network or storage failure increases

the possibility of document corruption. The Office XML Formats have been designed to

be more robust than the binary formats, thereby reducing the risk of lost information

due to damaged or corrupted files. The XML file formats improve data recovery by

segmenting and separately storing each part within the file package. Modular data

storage enable files to be opened even if a component within the file is damaged. This

data compartmentalization helps prevent the entire document from being lost and

potentially saves tremendous amounts of time and money spent recovering lost data.

Improved reliability means even documents created or altered outside of Office are less

likely to corrupt.

Efficient – The Office XML Formats use standard ZIP compression technology to store

documents. Because XML is a text–based format it compresses very well. The

combination of XML and ZIP technologies makes files universally accessible and offers

up to 50 percent smaller file sizes than comparable binary documents. This proffers a

Page 21: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-21

potential cost savings because it reduces the disk space required to store files and

decreases the bandwidth needed to transport files by way of e-mail, over networks, and

across the Web.

Improved security – The openness of the Office XML Formats translates to more secure

and transparent files. Documents can be shared confidently because personally

identifiable information and business sensitive information, such as user names,

comments and file paths, can be easily identified and removed. The file formats also

help to improve security against documents with embedded code or macros because

the new file formats do not execute embedded code. Therefore, an e-mail message with

a Word document attached can be safely opened knowing the document does not

execute harmful code. For files that contain embedded code, scripts or macros the

Office XML Formats include a special-purpose format with a different extension that

enables IT staff to quickly identify files that contain code. At the most extreme level of

security IT managers can easily bar files with the macro-enabled extension from their

networks entirely.

Backward-compatible – The 2007 Microsoft Office System is backward-compatible with

Office 2000, Office XP, and Office 2003. Users of these versions can adopt the new

format with little effort and continue to gain maximum benefit from existing files.

Specifically, older .doc, .xls, and .ppt binary formats can still be used and are fully

compatible with the 2007 file format. Free updates can be downloaded to enable the

older versions to open and edit files in the new format. Conversely, users who install the

2007 Office release can set the default file formats to use either the new or the older

extensions. This ensures that users can continue to work with third-party solutions

based on earlier versions, and simultaneously work with colleagues, suppliers,

customers, and others who have upgraded to the 2007 release.

Performance optimization – The transition to open document formats won’t slow down

users. Users, organizations, and developers that take advantage of the default Microsoft

Office Open XML Formats can unlock the possibilities for many new solution types and

scenarios that developers can build, without sacrificing performance.

Software Accessibility – To enable advanced support from screen readers, anyone can

write an application that accesses and/or manipulates the Open XML format files.

Organizations that combine existing business system investments with the Microsoft

2007 Office System platform, and the new XML-based file format will also benefit

because documents can be accessed as sources of data, manipulated without the Office

applications, and processed in existing enterprise solutions.

Page 22: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-22

Compatibility Overview

Users will find that addressing compatibility is a straightforward process and that existing Office documents

will work seamlessly with the 2007 Microsoft Office system. To ensure the Open XML Formats become an

important part of all Office environments, Microsoft has taken extensive measures to enable compatibility for

current Office users so they can easily exchange XML documents with older versions and integrate those

documents into other applications and systems.

To enable previous versions to read and write the new file formats, free updates can be applied to existing

installations. The patches enable older versions of the Office and the Windows shell to recognize the new file

name extensions.

Bulk conversion tools will also be available. Systems administrators can select the default file type and default

compatibility mode. Defaults can be set during installation or included in policies applied to specific users or

specific roles. For example, organizations undertaking staged upgrades or staged rollouts might want to set

Office 2003 binary as the default ‘Save’ option until all desktops have been upgraded.

This innovation breaks down a substantial barrier to sharing documents. When Office 2000, XP, or 2003

attempts to open a 2007 Microsoft Office system document, it will be able to do so freely; no “Save As”

operations or complicated workarounds are required. Requests such as “I can’t open your Office Word

attachment. Please save it using (my release’s) Office Word format” is a thing of the past.

Module 3, Compatibility, contains more information on conversion and compatibility issues.

Page 23: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-23

7. Evolving Customer Scenarios for the Open XML Formats - Examples

Microsoft has an evolving partnership with their Office customers to enable new kinds of innovation with the

Office System. Office users place increasing demands on the software, which drives innovation from

Microsoft. Support for XML in the 2007 Office System enables new, important scenarios that are driving

customer needs today. As Microsoft continues to improve the Office platform, this enables the advancement

of these scenarios for our customers. This helps our customers – and Microsoft – move forward.

Document Assembly

Document Assembly is an important scenario for organizations that construct documents from content that

already exists. Rather than forcing users to re-create the same content repeatedly, XML can be used to aid in

the migration of content between documents. This enables a “building blocks” approach to document

creation, and represents a huge time savings.

For example, suppose two companies merge. Thousands of documents in each company will need to have the

company name, logo, and other information changed. Using older technology requires opening each

document in the application that created it, making the changes, and closing the file. Some IT departments

have written scripts to accomplish this, but these are very inefficient. It is far more efficient to simply search

through existing XML files and do the replacement using software written in a high-level language. One

program can do the entire job in a completely automated way.

Integration and Content Reuse

Office XML Formats enable rapid creation of documents from disparate data sources, accelerating document

assembly, data mining, and content reuse. Exchanging data between Microsoft Office applications and

enterprise business systems is simplified—IT can alter information inside a Microsoft Office document or

create a document from scratch using standard tools and technologies; access to Microsoft Office applications

is not required. Productivity is improved by publishing, searching, and reusing information more quickly and

accurately in the environment users choose.

When content is published in multiple locations, the ability to reuse content is critically important. This

enables businesses to work from a single source of business information, for example the financial data stored

in a sales tracking system. Instead of having many users copy and paste the data, they can use the back-end

system as a data source to populate a template area of a document. This goes a long way to ensure accuracy

and data integrity throughout an enterprise. By taking advantage of the built-in collaboration features of

Office 2007 each user can be assured they are working with the latest version of the information when

assembling their specific document.

Document Sanitization

The increased awareness of compliance and information privacy is placing new demands on software to

protect sensitive information. The ability to detect and remove comments, document versions, personally

identifiable information ensures that sensitive data is not leaking outside the organization. This is important

especially for client-facing communications. In the 2007 release of Office document sanitization is simply a

matter of selecting File/Finish/Inspect Document. The Document Inspector examines the file and reports on

Page 24: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-24

whether it contains comments, revisions, annotations, personal information, and many other potentially

sensitive items. Users can choose to remove some or all of the categories in the document.

Document Interrogation

The ability to reuse content and to maximize the value of this portable data is predicated on the ability to FIND

it. The support for custom schema in Office enables users to tag data in a way that is meaningful to them, so if

they ever did need to reuse or republish that content, they can quickly search for these tags, like company

name, for example, to get to the content they have stored on their system.

Content Tagging

By adding a tagging schema to content, organizations can dramatically improve their content searches, as well

as improve the value of the information stored in documents. Even with all the emphasis on search

technology, the lack of a tagging taxonomy that is relevant to your business can prevent you from having the

most efficient search possible, which reduces employee productivity. Word, Excel and PowerPoint support

“smart tags.” Organizations can create their own smart tags then use them as the basis for searches.

Document Archival

XML-based document archives include the data and presentation information ensuring document formats can

be accessed and consumed long into the future without vendor-specific clients or applications.

Module 4, Developer Solutions, further explores these scenarios.

Page 25: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Technical Training Final Draft 1.7 2/20/2007 1-25

8. Executive Summary

To facilitate the interoperability of documents, and enable the exchange of documents across systems and

applications, the 2007 Microsoft® Office system introduces the new default XML file formats for Microsoft

Word text processing, Excel® spreadsheet, and PowerPoint® presentation graphics programs. These new

Office Open XML formats change the way developers approach solutions based on Office documents.

The Role of XML in Office

Interoperability by design – the 2007 Office System is designed to enable interoperability of documents and

information between users, programs, systems and applications.

The 2007 Microsoft® Office system is designed to

achieve industry alignment using standardized technologies.

enable data interoperability between documents, applications and systems.

capture and reuse information to and from many data sources.

build intelligent applications that improve data context and quality.

Figure 2: The Role Of XML in Office 2007

Integration into existing enterprise architectures

With these new formats, Microsoft ensures that organizations can successfully and completely integrate the

2007 Microsoft® Office system into existing enterprise architectures. This change represents a large step

Page 26: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-26

forward in extending Microsoft’s commitment to industry-standard integration technologies, to XML, and to

open, published file format specifications.

The file format is a compact, robust format that offers smaller file sizes, and improved data recovery. Because

the contents of this new file format are segmented and stored inside the file by data type, the ability for

developers to access, query, modify or repair file contents improves tremendously.

The new XML-based file formats are the default for Word, Excel and PowerPoint. This means that when these

programs are installed and a new document is created, it will automatically be saved using the new XML

format and file extensions. To ensure that users of prior Office releases can open, edit and save these

documents these new file formats are backward compatible to Office 2000. Microsoft has introduced new file

extensions for the Office applications, including new extensions for templates, macros, add-ins and other

formats.

In brief, the new default extensions for the three main Office applications include:

Word – .docx

Excel – .xlsx

PowerPoint - .pptx

Module 2, Architecture, contains more detailed information on file formats, extensions and structure.

For More Information

www.microsoft.com/office/preview

www.OpenXMLDeveloper.org

http://www.ecma-international.org/

www,Blogs.msdn.com/brian_jones

www.msdn.microsoft.com/office/xml

www.microsoft.com/technet/prodtechnol/office

www.microsoft.com/resources/casestudies

Page 27: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-27

Table of Contents – Module 1

Goal & Objectives ............................................................................................................................... 1

Key Concepts ....................................................................................................................................... 1

XML ........................................................................................................................................... 1

XML Schema ............................................................................................................................ 1

Ecma International ................................................................................................................. 1

Ecma Office Open XML Formats ........................................................................................... 2

1. The Evolution of Document Authoring ................................................................................... 3

2. How XML Enables the Future of Document Authoring ........................................................ 4

A Brief History of XML ............................................................................................................ 4

The Role of XML File Formats ............................................................................................... 5

Openness & Transparency .................................................................................................... 5

Interoperability ....................................................................................................................... 6

Transitioning Office Users to XML-based Formats ............................................................ 6

3. The Path to Interoperability: How Office Currently Supports XML ....................................... 7

4. Customer Use Case Examples – Office 2003 ......................................................................... 9

Excel Custom Schema Examples ......................................................................................... 10

Word Custom Schema Examples ........................................................................................ 13

5. Introducing the Open XML Formats ...................................................................................... 18

Office Open XML Formats Reference Schema ................................................................. 18

Standardization - Ecma International ................................................................................ 18

6. Benefits .......................................................................................................................................... 20

Office Open XML Formats Benefit Highlights ................................................................... 20

Compatibility Overview ....................................................................................................... 22

Page 28: Office Open XML Formats 1 Fundamentals · Open XML Technical Training Final Draft 1.7 2/20/2007 1-3 1. The Evolution of Document Authoring As technology advances, the way people work

Open XML Formats Training 2/20/2007 1-28

7. Evolving Customer Scenarios for the Open XML Formats - Examples.............................. 23

Document Assembly ............................................................................................................ 23

Integration and Content Reuse .......................................................................................... 23

Document Sanitization......................................................................................................... 23

Document Interrogation ...................................................................................................... 24

Content Tagging .................................................................................................................... 24

Document Archival ............................................................................................................... 24

8. Executive Summary ................................................................................................................. 25

The Role of XML in Office .................................................................................................... 25

Integration into existing enterprise architectures ........................................................... 25

For More Information ...................................................................................................................... 26