Upload
david-truxall
View
858
Download
0
Embed Size (px)
DESCRIPTION
My presentation on Office 2007/OpenXML file formats
Citation preview
Creating Office Documents Creating Office Documents with with
Open XML Open XML
David Truxall, Ph.D.David Truxall, Ph.D.
Principal ConsultantPrincipal Consultant
NuSoft SolutionsNuSoft Solutions
AgendaAgenda
OverviewOverview System.IO.PackagingSystem.IO.Packaging Building Documents with .NetBuilding Documents with .Net
Open XMLOpen XML
A Standard that describes a family of A Standard that describes a family of XML schemas (Ecma Standard)XML schemas (Ecma Standard)
Defines the XML vocabularies for word-Defines the XML vocabularies for word-processing, spreadsheet, and processing, spreadsheet, and presentation documentspresentation documents
Defines the packaging of documents Defines the packaging of documents that conform to these schemasthat conform to these schemas
Features of Office Open XMLFeatures of Office Open XML
Support for Open XMLSupport for Open XML
iPhoneiPhone iWorkiWork Microsoft OfficeMicrosoft Office OpenOfficeOpenOffice GnumericGnumeric WordPerfectWordPerfect Palm OSPalm OS NeoOfficeNeoOffice
PHPPHP JavaJava Monarch v.9.0Monarch v.9.0 OpenXML WriterOpenXML Writer Word Counter 2.2.1Word Counter 2.2.1 Altsoft XML2PDFAltsoft XML2PDF MindMappingMindMapping XmlSpyXmlSpy
Open XML Format ArchitectureOpen XML Format Architecture
File Container
Document Properties
Comments
WordML / Spreadsheet ML
Custom XML
Embedded Code
Images / Video / Sound
User view: single Office file
Document PartsMost parts are XMLMost parts are XMLEach XML part is a discrete Each XML part is a discrete componentcomponentCan add, extract and modify Can add, extract and modify individual parts without using individual parts without using Office programsOffice programsCorruption of any part would not Corruption of any part would not prohibit the file from openingprohibit the file from opening
Developer view: modular file
Open Packaging OrganizationOpen Packaging Organization Package – The container (a ZIP archive)Package – The container (a ZIP archive) Document Parts – The files inside the containerDocument Parts – The files inside the container Relationships – Every part that references other Relationships – Every part that references other
parts does so via a relationshipparts does so via a relationship
Document Properties
Application Properties
Custom PropertiesSheet 1
Sheet 2
Sheet 3Strings
Theme
Workbook
Exploring the Document PackageExploring the Document Package
Reference SchemasReference Schemas
Xml Reference SchemasXml Reference Schemas 80+ that make up the standard80+ that make up the standard
Display orientedDisplay oriented Document formatDocument format
Custom SchemasCustom Schemas Specific to your businessSpecific to your business
Data orientedData oriented Business informationBusiness information
Custom XML ContentCustom XML Content Enables interoperability with other systemsEnables interoperability with other systems
Documents can provide a rich view of back-end data Documents can provide a rich view of back-end data sourcessources
Documents can update back-end data sourcesDocuments can update back-end data sources
Exposes business data in Open XML documentsExposes business data in Open XML documents Heterogenous systems can easily read data from Heterogenous systems can easily read data from
documentsdocuments Business-specific semantics can be applied to document Business-specific semantics can be applied to document
datadata
Separates presentation and dataSeparates presentation and data Simplified programming model for all of the aboveSimplified programming model for all of the above
Custom XML schema support was a key design Custom XML schema support was a key design objective for Open XML: objective for Open XML: any schema any schema can be used can be used in Open XML documents.in Open XML documents.
System.IO.PackagingSystem.IO.Packaging Part of Windows Presentation Part of Windows Presentation
FoundationFoundation Installed with .NET 3.0Installed with .NET 3.0 Requires .NET 2.0 RuntimeRequires .NET 2.0 Runtime Enables package manipulation forEnables package manipulation for
Office Open XML File FormatsOffice Open XML File Formats XML Paper Specification FilesXML Paper Specification Files Any Open Packaging Convention filesAny Open Packaging Convention files
The PackageThe Package
Package ClassPackage Class
Provides methods to Provides methods to create, enumerate create, enumerate and delete the and delete the following entities:following entities: PackagePackage Package PropertiesPackage Properties PackageRelationshipsPackageRelationships PackagePartsPackageParts
Common Package Parts
Pac
kage
Rel
atio
nshi
psP
acka
ge R
elat
ions
hips
Core PropertiesCore Properties
Digital SignaturesDigital Signatures
Specific Format Parts
Office DocumentOffice Document
Par
t Rel
atio
nshi
psP
art R
elat
ions
hips XML PartXML Part
XML PartXML Part
Par
t Rel
sP
art R
els
Etc…
The PackagePartThe PackagePart A PackagePart is the A PackagePart is the
object of data within the object of data within the PackagePackage
It provides support to It provides support to create, enumerate and create, enumerate and delete part relationshipsdelete part relationships
Get data as a Get data as a System.IO.StreamSystem.IO.Stream
PackagePart properties:PackagePart properties: CompressionOptionCompressionOption ContentTypeContentType PackagePackage UriUri
PackageRelationshipPackageRelationship Required to find parts Required to find parts
(part names are not (part names are not guaranteed)guaranteed)
Iterate through a Iterate through a RelationshipCollection RelationshipCollection by type or IDby type or ID
Relationship PropertiesRelationship Properties IDID PackagePackage RelationshipTypeRelationshipType SourceUriSourceUri TargetModeTargetMode TargetUriTargetUri
Package Uri HelperPackage Uri Helper Find a related PackagePart by searching Find a related PackagePart by searching
relationships, either by relationship type or relationships, either by relationship type or relationship IDrelationship ID This returns a list of PackageRelationship objectsThis returns a list of PackageRelationship objects
A PackageRelationship defines two relative URIsA PackageRelationship defines two relative URIs Source URI, pointing to the source PackagePartSource URI, pointing to the source PackagePart Target URI, pointing to the target PackagePartTarget URI, pointing to the target PackagePart
Retrieve a PackagePart by using a URI relative to Retrieve a PackagePart by using a URI relative to the root of the Packagethe root of the Package Translation of Source and Target URIs is requiredTranslation of Source and Target URIs is required Use the PackUriHelper class to aid in the translationUse the PackUriHelper class to aid in the translation
System.IO.PackagingSystem.IO.Packaging
SpreadsheetMLSpreadsheetMLWorkbook properties
table
chart
styles
calcChain
sharedStrings
sheet1..Nsheet1..Nsheet1..Nsheet1..N
sheet1..Nsheet1..Nsheet1..Ndrawing
Workbooks, WorksheetsWorkbooks, Worksheets Rows, Columns, ValuesRows, Columns, Values FormulasFormulas
Workbooks, WorksheetsWorkbooks, Worksheets Rows, Columns, ValuesRows, Columns, Values FormulasFormulas
The Minimal xlsxThe Minimal xlsx Required: Required: workbook.xmlworkbook.xml, the document “start part”, the document “start part” Required: at least one sheet, Required: at least one sheet, worksheet.xmlworksheet.xml Required: one relationship part (Required: one relationship part (.rels.rels))
Must be in a Must be in a _rels _rels folderfolder
Required: Required: [Content_Types].xml[Content_Types].xml Required part for all Open XML documentsRequired part for all Open XML documents ThreeThree content types must be defined: content types must be defined:
SpreadsheetML main document (for the start part)SpreadsheetML main document (for the start part) WorksheetWorksheet Package relationships (for the required relationships)Package relationships (for the required relationships)
Everything else is optionalEverything else is optional Worksheet Worksheet <sheetdata><sheetdata> is required, but may be empty is required, but may be empty
SpreadsheetML TablesSpreadsheetML Tables
SpreadsheetML tables provide structure and SpreadsheetML tables provide structure and formatting for worksheet informationformatting for worksheet information
Separation of presentation and data:Separation of presentation and data: Data stays in the worksheetData stays in the worksheet Table definition in separate part (implicit relationship)Table definition in separate part (implicit relationship)
Open XML has different types of tables for each Open XML has different types of tables for each document type, optimized for different scenarios:document type, optimized for different scenarios: WordprocessingML has its WordprocessingML has its tbltbl element element SpreadsheetML has its SpreadsheetML has its tabletable element element PresentationML uses DrawingML tables (PresentationML uses DrawingML tables (tbl tbl
inside inside graphicDatagraphicData))
SpreadsheetML TableSpreadsheetML Table
<sheetData> <row r="1" spans="1:2"> <c r="A1" t="s"><v>0</v></c> <c r="B1" t="s"><v>1</v></c> </row> <row r="2" spans="1:2"> <c r="A2"><v>1</v></c> <c r="B2"><v>4</v></c> </row> <row r="3" spans="1:2"> <c r="A3"><v>2</v></c> <c r="B3"><v>5</v></c> </row> <row r="4" spans="1:2"> <c r="A4"><v>3</v></c> <c r="B4"><v>6</v></c> </row></sheetData>...<tableParts count="1"> <tablePart r:id="rId2"/></tableParts>
Headings = shared strings
Worksheet (sheet1.xml)
Table definition (table1.xml)<table … ref="A1:B4” …> <autoFilter ref="A1:B4”/> <tableColumns count="2"> <tableColumn id="1" name="Column1" /> <tableColumn id="2" name="Column2" /> </tableColumns> <tableStyleInfo …/> </table>
ExcelPackageExcelPackage
Open Source API on CodeplexOpen Source API on Codeplex Wraps System.IO.Packaging and Wraps System.IO.Packaging and
SpreadsheetMLSpreadsheetML
http://www.codeplex.com/ExcelPackage
WordProcessingML DocumentWordProcessingML DocumentDocument
bodyproperties
fontTable
headers/footers
images
numberingDefinitions
styles
customXML
footnotes/endnotes
commentsA WordprocessingML file is a collection of multiple “stories”:
The main story
Header(s) / Footer(s)
Footnote(s) / Endnote(s)
Subdocuments
Comment(s)
Main Document PartMain Document Part
The top-level element in the start part (e.g., The top-level element in the start part (e.g., document.xml) is document.xml) is documentdocument
Document Document has two optional child elements:has two optional child elements: The The backgroundbackground element, which specifies the element, which specifies the
settings for the background for the documentsettings for the background for the document The The bodybody element, which contains the content of element, which contains the content of
the main storythe main story
Block-Level ElementsBlock-Level Elements The The bodybody element contains the main document element contains the main document
story, made up of block-level elements:story, made up of block-level elements: ParagraphsParagraphs TablesTables Custom XML markupCustom XML markup Alternate format chunksAlternate format chunks SubdocumentsSubdocuments Final section propertiesFinal section properties Future extensibility containersFuture extensibility containers
Nested elements: a table may contain a table which Nested elements: a table may contain a table which contains a paragraph, etc.contains a paragraph, etc.
Inline StructuresInline Structures The The <w:p><w:p> paragraph element contains inline paragraph element contains inline
structures:structures:
Runs (containing <w:t> text regions)Runs (containing <w:t> text regions) Custom Markup (can occur at block or inline level)Custom Markup (can occur at block or inline level) Annotations (comments, tracked changes, Annotations (comments, tracked changes,
bookmarks)bookmarks) DrawingML elementsDrawingML elements Fields (date, page number, document creator, etc.)Fields (date, page number, document creator, etc.) HyperlinksHyperlinks
Paragraphs <w:p>Paragraphs <w:p> The most basic unit of a WordprocessingML The most basic unit of a WordprocessingML
documentdocument Contains three pieces of information:Contains three pieces of information:
Paragraph propertiesParagraph properties Inline contentInline content optional revision IDs used for document merge and optional revision IDs used for document merge and
comparecompare
A paragraph may occur at any location which A paragraph may occur at any location which allows block level content:allows block level content: At the top-most level within a story (e.g. header, footer, At the top-most level within a story (e.g. header, footer,
main document)main document) Nested within a table cellNested within a table cell Nested within a structured document tag or annotation Nested within a structured document tag or annotation
markersmarkers
Paragraph PropertiesParagraph Properties
Can be set directly on a paragraph (below)Can be set directly on a paragraph (below)or in a paragraph styleor in a paragraph style
24 total property settings24 total property settings
<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>
<w:p> <w:pPr> <w:widowControl w:val=“on” /> <w:keepNext/> <w:keepLines/> <w:pageBreakBefore/> <w:suppressLineNumbers /> <w:suppressAutoHyphens /> <w:textBoxTightWrap /> </w:pPr> … runs, paragraph content …</w:p>
Runs <w:r>Runs <w:r> A run is a region of text with a common set A run is a region of text with a common set
of propertiesof properties All text must be contained within runsAll text must be contained within runs All runs must be contained within All runs must be contained within
paragraphsparagraphs A run contains three types of information:A run contains three types of information:
Run propertiesRun properties Run content (text, fields, soft line breaks, Run content (text, fields, soft line breaks,
pictures, etc.)pictures, etc.) Optional revision IDs for document comparisonOptional revision IDs for document comparison
Define formatting forDefine formatting forindividual charactersindividual characters
Font attributes, size/position, etc.Font attributes, size/position, etc. 24 total properties24 total properties
Run PropertiesRun Properties
<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />
<w:r> <w:rPr> <w:rFonts w:ascii=“Arial” w:hAnsi=“Arial” w:cs=“Arial” /> <w:b/> <w:i/> <w:sz w:val=“11” /> <w:dstrike w:val=“true” />
PresentationMLPresentationML
View Properties
PresentationProperties
Code
Themes
Fonts
Notes Masters
Slides
HandoutMasters
Slide Masters
Notes Slides
Slide Layouts
Presentation
The Minimal pptxThe Minimal pptx
Presentation ElementPresentation Element Presentation.xmlPresentation.xml
Slide MastersSlide Masters Notes MastersNotes Masters Handout MastersHandout Masters SlidesSlides
Relationships PartRelationships Part Links to slide partsLinks to slide parts
Slide PartsSlide Parts
<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr> <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr> <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>
<p:sld xmlns:p=“…/presentationml/2006/main” xmlns:a=“…/drawingml/2006/main” …> <p:cSld> <p:spTree> <p:sp> <p:nvSpPr> <p:cNvPr id="2" name="7-Point Star 1” /> … <p:sp> <p:nvSpPr> <p:cNvPr id="3" name="TextBox 2” /> … <p:graphicFrame> <p:nvGraphicFramePr> <p:cNvPr id="4" name="Chart 3” /> … </p:spTree> </p:cSld> <p:clrMapOvr> <a:masterClrMapping /> </p:clrMapOvr></p:sld>
Shape
Chart
Textbox
Object Parts – DrawingMLObject Parts – DrawingML
Shape ChartTextbox
Chart Part (chart1.xml)
Data source
DrawingMLDrawingML 5 Main types of objects5 Main types of objects
ShapeShape Group ShapeGroup Shape ConnectorConnector PicturePicture Graphic FrameGraphic Frame
General-purpose container General-purpose container Used for Charts, Diagrams, TablesUsed for Charts, Diagrams, Tables
Most widely used elements are Property elementsMost widely used elements are Property elements Non-Visible Properties (nvPrs): union of common Non-Visible Properties (nvPrs): union of common
nvPrs and object specific nvPrsnvPrs and object specific nvPrs Visible Properties: object specificVisible Properties: object specific
ShapesShapes Preset geometryPreset geometry
Pick the preset shapePick the preset shape Specify the adjust values for the shapeSpecify the adjust values for the shape
Text geometryText geometry Pick the preset text shapePick the preset text shape Specify the adjust values for the text shapeSpecify the adjust values for the text shape
Custom geometryCustom geometry Not covered in this courseNot covered in this course
<a:blipFill> <a:blip r:embed="rId2" /> <a:stretch> <a:fillRect /> </a:stretch></a:blipFill>
<a:ln> <a:solidFill> <a:srgbClr val="4F81BD" /> </a:solidFill> <a:prstDash val="sysDash" /></a:ln>
Shape Line and Fill PropertiesShape Line and Fill Properties
Indicates relationship idto image data
BLIP (Binary Large Image or Pictures) Fill
Gradient Fill
Dash Line and Solid Fill
Fill
Dashed Line
Line
<a:gradFill flip="none" rotWithShape="1"> <a:gsLst> <a:gs pos="0"> <a:srgbClr val="DDEBCF" /> </a:gs> <a:gs pos="50000"> <a:srgbClr val="9CB86E" /> </a:gs> ... </a:gsLst> <a:lin ang="4200000" scaled="0" /> <a:tileRect /></a:gradFill>
Gradient stop and color
PicturesPictures <p:pic> <p:nvPicPr> <p:cNvPr id="4" name="lake.jpeg" /> <p:cNvPicPr> <a:picLocks noChangeAspect="1" /> </p:cNvPicPr> <p:nvPr /> </p:nvPicPr> <p:blipFill> <a:blip r:embed="rId2" /> <a:stretch> <a:fillRect /> </a:stretch> </p:blipFill> <p:spPr> <a:xfrm> <a:off x="762000" y="571500" /> <a:ext cx="7620000" cy="5715000" /> </a:xfrm> <a:prstGeom prst="rect"> <a:avLst /> </a:prstGeom> </p:spPr></p:pic>
Define a Picture:Define a Picture:<p:pic/><p:pic/>
Source image rel. idSource image rel. id<a:blip r:embed=“rId2”/><a:blip r:embed=“rId2”/>
Acts similar to a shapeActs similar to a shape<p:spPr/><p:spPr/>
Non-Visual picture Non-Visual picture properties convey properties convey picture specific save picture specific save propertiesproperties<p:nvPicPr/><p:nvPicPr/>
Similar for Audio & Similar for Audio & VideoVideo
Pictures vs. ShapesPictures vs. Shapes
1. Single fill allowed2. Borders grow in/outward3. Must be done by app4. Can have text attached5. Can have shape properties6. Shape specific UI enabled
1. Two overlaid fills allowed2. Borders grow outward3. Lock aspect ratio flag4. Cannot have text attached5. Can have shape properties6. Picture specific UI enabled
Graphic ObjectsGraphic Objects
GraphicGraphic element represents a single graphical object element represents a single graphical object GraphicDataGraphicData element and element and UriUri attribute attribute
Specifies the namespace for the embedded contentSpecifies the namespace for the embedded content Tells the consumer how to interpret the graphicDataTells the consumer how to interpret the graphicData Ability to render is application specificAbility to render is application specific Office supports a set of specific URI values:Office supports a set of specific URI values:
http://schemas.openxmlformats.org/drawingml/2006/charthttp://schemas.openxmlformats.org/drawingml/2006/chart http://schemas.openxmlformats.org/drawingml/2006/diagramshttp://schemas.openxmlformats.org/drawingml/2006/diagrams
Graphic Object
<graphic> <a:graphicData uri="http://schemas.../drawingml/2006/chart"> <c:chart xmlns:c="http://schemas.../drawingml/2006/chart" xmlns:r="http://schemas.../officeDocument/2006/relationships" r:id="rd123232" /> </a:graphicData></graphic>
<graphic> <a:graphicData uri="http://schemas.../drawingml/2006/chart"> <c:chart xmlns:c="http://schemas.../drawingml/2006/chart" xmlns:r="http://schemas.../officeDocument/2006/relationships" r:id="rd123232" /> </a:graphicData></graphic>
URI means chartfollows
ChartsCharts Graphic Object definitionGraphic Object definition
References separate XML chart partReferences separate XML chart part Defined in DrawingML namespaceDefined in DrawingML namespace
Chart XML PartChart XML Part Visual representation of data.Visual representation of data. Includes a cache of data for chart.Includes a cache of data for chart. Includes formatting using DrawingML.Includes formatting using DrawingML.
Data RelationshipData Relationship External relationship to file, orExternal relationship to file, or Internal relationship to embedded Internal relationship to embedded
spreadsheetspreadsheet Spreadsheets point to their own data.Spreadsheets point to their own data.
Chart DrawingChart Drawing Contains shapes and pictures drawn Contains shapes and pictures drawn
on charton chart
XML Chart Part
XML Chart Part
Graphic Object
Graphic Object
Data SourceData
SourceChart
DrawingChart
Drawing
Build a Document in CodeBuild a Document in Code
ResourcesResources
OpenXMLDeveloper.orgOpenXMLDeveloper.org OpenXMLSDKOpenXMLSDK Package ExplorerPackage Explorer Code SnippetsCode Snippets
http://blogs.nusoftsolutions.com/DTruxall/
[email protected]@nusoftsolutions.com
http://blogs.nusoftsolutions.com/DTruxall/http://blogs.nusoftsolutions.com/DTruxall/