20
Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Embed Size (px)

Citation preview

Page 1: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Improving the outputcapabilities of Stata with

Open Document Format xml

Adam Jacobs

Dianthus Medical Limited

Page 2: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Stata’s 3-fold capabilities

Statistics

Graphics

Data management

Page 3: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Statistics

Page 4: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Graphics

Page 5: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Data management

Page 6: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

But there is a 4th...

Page 7: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Text output

A recent clinical study:– 92 pages of raw data listings– 124 pages of descriptive data tabulations– 3 pages of statistical analysis

All from a study in 12 healthy volunteers

Page 8: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Stata’s text output

Page 9: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Problems with Stata’s text output

No pagination

No formatting (or limited formatting with smcl)

Variable labels not always shown

No Unicode support

No tables of contents

etc etc

Page 10: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Some examples...

Page 11: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

So how did I do it?

Page 12: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Open Document Format

An open standard, approved by ISO

XML based

For a variety of office-type documents

Used by the popular open-source office suite OpenOffice.org

Here, we are just interested in word-processing documents

Page 13: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

.odt files

A .odt file is the native file format of OpenOffice.org Writer

A zip file

Contains various files, the most important of which is content.xml

content.xml is simply a plain-text file

Stata is good at writing plain-text files!

Page 14: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

The Stata code

Creates the content.xml file by writing data with appropriate xml tags

Added to other files, zipped to .odt file

.odt file can be opened directly with Writer

Page 15: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Some examples...

Page 16: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Basics of XML

<company name=“Dianthus Medical Limited”><employee role=“speaker”>

<firstname>Adam</firstname><lastname>Jacobs</lastname>

</employee><employee role=“delegate”>

<firstname>Flavia</firstname><lastname>White</lastname>

</employee></company>

Page 17: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

XML code for start of table

<table:table table:style-name="Table42">

<table:table-column table:style-name="TabCol13"/>

<table:table-column table:style-name="TabCol9"/>

<table:table-column table:style-name="TabCol8"/>

<table:table-column table:style-name="TabCol8"/>

Page 18: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

XML code for table cells

<table:table-cell table:style-name="cell1211"><text:p text:style-name="Table_20_Contents">

Mileage (mpg)</text:p></table:table-cell><table:table-cell table:style-name="cell1111">

<text:p text:style-name="Table_20_Contents">N</text:p></table:table-cell><table:table-cell table:style-name="cell1111"> <text:p text:style-name= "Table_20_ContentsNumeric">

52<text:s text:c="3"/></text:p></table:table-cell>

Page 19: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Was this a lot of work?

123 kB of code

21 ado files

45 Mata functions

And not finished yet!

Page 20: Improving the output capabilities of Stata with Open Document Format xml Adam Jacobs Dianthus Medical Limited

Any questions?