46
Math Editing and Math Editing and Display in Word Display in Word 2007 2007 Murray Sargent III Murray Sargent III Publisher Text Services Publisher Text Services 28-may-2008 28-may-2008

302 sargent word2007-ssp2008

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: 302 sargent word2007-ssp2008

Math Editing and Display Math Editing and Display in Word 2007in Word 2007

Murray Sargent IIIMurray Sargent IIIPublisher Text ServicesPublisher Text Services

28-may-200828-may-2008

Page 2: 302 sargent word2007-ssp2008

OverviewOverview

8 math infrastructures enable better math 8 math infrastructures enable better math display/editingdisplay/editing

New Office math edit/display environmentNew Office math edit/display environment Interoperate with math programs such as Interoperate with math programs such as

Mathematica, Maple, publisher workflowMathematica, Maple, publisher workflow Input methods and formatsInput methods and formats LayoutLayout Math fontMath font

Page 3: 302 sargent word2007-ssp2008

Complex ProjectComplex Project

Intricacies of math typesettingIntricacies of math typesetting Creating and using a large set of glyph variantsCreating and using a large set of glyph variants Vagaries of math notationVagaries of math notation Embedding math zones into international text Embedding math zones into international text

environmentsenvironments Interaction with complex scriptsInteraction with complex scripts Math in other objects like hyperlinks, rubyMath in other objects like hyperlinks, ruby Input with nonASCII keyboardsInput with nonASCII keyboards

Page 4: 302 sargent word2007-ssp2008

Eight Math InfrastructuresEight Math Infrastructures

[La]TeX: current tech-doc standards[La]TeX: current tech-doc standards Unicode 5.0: includes ~2000 math symbolsUnicode 5.0: includes ~2000 math symbols MathML 2.0: math K MathML 2.0: math K –– 12 and beyond 12 and beyond OpenType font technology: special math tablesOpenType font technology: special math tables New math font (Cambria Math)New math font (Cambria Math) Math layout handlerMath layout handler Shared math input componentsShared math input components MS Office environment, autocorrect MS Office environment, autocorrect

Page 5: 302 sargent word2007-ssp2008

[La]TeX[La]TeX

Widely used, high-quality tech document Widely used, high-quality tech document preparation languagepreparation language

Simple ASCII keyboard entrySimple ASCII keyboard entry Usage and math typography are very well Usage and math typography are very well

documenteddocumented Stable since 1990Stable since 1990 Complex scenarios are hard to editComplex scenarios are hard to edit Numerous dialects, user macros, and lack of Numerous dialects, user macros, and lack of

Unicode complicate interchangeUnicode complicate interchange Fonts aren’t well suited to screen displayFonts aren’t well suited to screen display

Page 6: 302 sargent word2007-ssp2008

Unicode 5.0Unicode 5.0

340 math chars exist in ASCII, U+2200 block, 340 math chars exist in ASCII, U+2200 block, arrows, combining marksarrows, combining marks

1016 math alphanumeric characters are in 1016 math alphanumeric characters are in Unicode Plane 1 or Letterlike SymbolsUnicode Plane 1 or Letterlike Symbols

591 new math symbols and operators are on 591 new math symbols and operators are on BMPBMP

One math variant selectorOne math variant selector One new combining character (reverse solidus)One new combining character (reverse solidus) New math characters were requested by STIXNew math characters were requested by STIX

Page 7: 302 sargent word2007-ssp2008

Basic Set of Alphanumeric Basic Set of Alphanumeric CharactersCharacters

Latin digits (Latin digits (0 - 90 - 9)) Upper- & lowercase Latin letters (Upper- & lowercase Latin letters (aa - - zz, , AA - - ZZ) ) Uppercase Greek letters Uppercase Greek letters - Α Ω - Α Ω plus the nabla plus the nabla ∇∇

and a variant of theta and a variant of theta ΘΘ Lowercase Greek letters Lowercase Greek letters - α ω - α ω plus the partial plus the partial

differential sign differential sign ∂∂ and glyph variants of and glyph variants of , , , , ε θ κ φ, , , , ε θ κ φρρ,, and and ππ

Only unaccented forms of letters are usedOnly unaccented forms of letters are used

Page 8: 302 sargent word2007-ssp2008

Legibility LossLegibility Loss

Without math alphabetics, the Hamiltonian formulaWithout math alphabetics, the Hamiltonian formula   

HH = = ddττ [[εεEE22 + + μμHH22]]

  becomes an integral equationbecomes an integral equation

H = H = ddττ [[εεEE22 + + μμHH22]]

Page 9: 302 sargent word2007-ssp2008

Math Alphanumeric CharactersMath Alphanumeric Characters

• Math needs various Latin and Greek styles like Math needs various Latin and Greek styles like normal, bold, italic, script, Fraktur, and open-facenormal, bold, italic, script, Fraktur, and open-face

• May appear to be font variations, but have distinct May appear to be font variations, but have distinct semantics and spacingssemantics and spacings

• Without these distinctions, you get gibberish, violating Without these distinctions, you get gibberish, violating Unicode rule: Unicode rule: plain text must contain enough info to plain text must contain enough info to permit text to be rendered legibly, and nothing morepermit text to be rendered legibly, and nothing more

• Plain-text searches should distinguish between Plain-text searches should distinguish between alphabets, e.g., a search for script alphabets, e.g., a search for script HH shouldn’t match shouldn’t match HH, etc., etc.

Page 10: 302 sargent word2007-ssp2008

MathMLMathML

MathML 1.0 (April, 1998) was the first World MathML 1.0 (April, 1998) was the first World Wide Web Consortium (W3C) endorsed XML Wide Web Consortium (W3C) endorsed XML vocabularyvocabulary

Low-level format for describing mathematics as Low-level format for describing mathematics as a basis for machine to machine communicationa basis for machine to machine communication

MathML facilitates the use and re-use of MathML facilitates the use and re-use of scientific content on the Webscientific content on the Web

MathML 2.0 released in late 2003 is now widely MathML 2.0 released in late 2003 is now widely used in exchanging mathematical textused in exchanging mathematical text

MathML 2.0 spec has a wealth of math infoMathML 2.0 spec has a wealth of math info

Page 11: 302 sargent word2007-ssp2008

MathML Presentation MarkupMathML Presentation Markup

<mrow> <mi>E</mi> <mo>=</mo> <mrow> <mi>m</mi>

<mo>&InvisibleTimes;</mo> <msup> <mi>c</mi> <mn>2</mn> </msup>

</mrow></mrow>

Presentation markup directs how the math Presentation markup directs how the math should be rendered.should be rendered.

E = mc2

Page 12: 302 sargent word2007-ssp2008

Office MathML (OMML)Office MathML (OMML)

<m:oMath><m:oMath>

<m:r><m:t>E=m</m:t></m:r><m:r><m:t>E=m</m:t></m:r>

<m:sSup><m:sSup>

<m:e><m:e>

<m:r><m:t>c</m:t></m:r><m:r><m:t>c</m:t></m:r>

</m:e></m:e>

<m:sup><m:sup>

<m:r><m:t>2</m:t></m:r><m:r><m:t>2</m:t></m:r>

</m:sup></m:sup>

</m:sSup></m:sSup>

</m:oMath></m:oMath>

E = mc2

Page 13: 302 sargent word2007-ssp2008

MathML with Custom XMLMathML with Custom XML

Can put arbitrary namespace attributes in Can put arbitrary namespace attributes in MathML tagsMathML tags

More complicated embellishments can useMore complicated embellishments can use

<semantics><semantics>MathML representationMathML representation<annotation-XML><annotation-XML>

EnhancementsEnhancements</annotation-XML></annotation-XML>

</semantics></semantics>

Page 14: 302 sargent word2007-ssp2008

MathML ParsingMathML Parsing

MathML can be tricky to parse. For sin MathML can be tricky to parse. For sin xx::

<mrow><mrow>

<mi>sin</mi><mi>sin</mi>

<mo>&FunctionApply;</mo><mo>&FunctionApply;</mo>

<mi>x</mi><mi>x</mi>

</mrow></mrow>

Don’t know it’s a function-apply object until Don’t know it’s a function-apply object until reaching &FunctionApply: have to analyze reaching &FunctionApply: have to analyze expressions as with the linear formatexpressions as with the linear format

Page 15: 302 sargent word2007-ssp2008

Linear FormatLinear Format

E=mc^2E=mc^2

E = mc2

Page 16: 302 sargent word2007-ssp2008

Math RTFMath RTF

Math RTF is OMML in RTF syntaxMath RTF is OMML in RTF syntax Somewhat simplified (doesn’t need text tag)Somewhat simplified (doesn’t need text tag) For example, For example,

<m:f> ... </m:f> → {\mf ... }<m:f> ... </m:f> → {\mf ... } Thoroughly defined in latest Thoroughly defined in latest RTF spec Reading spec is great way to learn how Word Reading spec is great way to learn how Word

represents mathrepresents math

Page 17: 302 sargent word2007-ssp2008

Accented charactersAccented characters

Accents are handled by math accent Accents are handled by math accent objectobject

Accents may apply to multiple charactersAccents may apply to multiple characters Accents may be flattenedAccents may be flattened

Page 18: 302 sargent word2007-ssp2008

Vagaries of Math NotationVagaries of Math Notation

Choice of subscript/superscript baseChoice of subscript/superscript base Function arguments likeFunction arguments like Integrands and Integrands and nn-aryands-aryands Absolute value ambiguities like ||Absolute value ambiguities like ||aa|-||-|bb||. ||.

Actually this example is unambiguous, but Actually this example is unambiguous, but ||aa||b b - - cc||dd| has two possible meanings| has two possible meanings

Context sensitive ellipses: … vs ⋯Context sensitive ellipses: … vs ⋯

Page 19: 302 sargent word2007-ssp2008

Math SpacingMath Spacing

Operators have math spacing given by extended Operators have math spacing given by extended TeX spacing rulesTeX spacing rules

Function object gives correct spacing between Function object gives correct spacing between object and neighbors, and between function object and neighbors, and between function name and argumentname and argument

nn-aryand object gives correct spacing between -aryand object gives correct spacing between nn-ary operator and its -ary operator and its nn-aryand-aryand

Automate much need for TeX spacing “tweaks”Automate much need for TeX spacing “tweaks” Context-dependent operator spacing like + - . , :Context-dependent operator spacing like + - . , :

Page 20: 302 sargent word2007-ssp2008

Font SizingFont Sizing

Text style, script style (70%), script script Text style, script style (70%), script script style (60%)style (60%)

Sub/sups…, fractions in lineSub/sups…, fractions in line CrampedCramped

Page 21: 302 sargent word2007-ssp2008

ConfusablesConfusables

1 vs l1 vs l 𝑎 𝑎 vs vs 𝛼𝛼 𝑣 𝑣 vsvs 𝜈 𝜈 vsvs 𝜐 𝜐 𝒳 𝒳 vsvs 𝜒 𝜒 Y vs Y vs ΥΥOther letter similarities are so close Other letter similarities are so close that they are avoided, e.g., UC alpha that they are avoided, e.g., UC alpha and LC omicron are never used.and LC omicron are never used.

Page 22: 302 sargent word2007-ssp2008

Math Input MethodsMath Input Methods

Linear format input and manual buildupLinear format input and manual buildup Formula autobuildup (FAB)Formula autobuildup (FAB) Math ribbonsMath ribbons Recognition of handwritten formulaeRecognition of handwritten formulae Hex code inputHex code input WYSIWYG editingWYSIWYG editing Hybrid editing (combination of WYSIWYG Hybrid editing (combination of WYSIWYG

and FAB)and FAB)

Page 23: 302 sargent word2007-ssp2008

Hex to Unicode Input MethodHex to Unicode Input Method

Type Unicode character hexadecimal codeType Unicode character hexadecimal code Make corrections as need beMake corrections as need be Type Alt+x to convert to characterType Alt+x to convert to character Type Alt+x to convert back to hex (useful Type Alt+x to convert back to hex (useful

especially for “missing glyph” character)especially for “missing glyph” character) Resolve ambiguities by selectionResolve ambiguities by selection Input higher-plane chars using 5 or 6-digit codeInput higher-plane chars using 5 or 6-digit code MS Word and RichEdit standardMS Word and RichEdit standard

Page 24: 302 sargent word2007-ssp2008

Autocorrect ExamplesAutocorrect Examples

Type \delta and get Type \delta and get δδ, \Delta and get , \Delta and get ΔΔ Define \quadratic to beDefine \quadratic to be

x = (-b ± √(b^2 - 4ac))/2ax = (-b ± √(b^2 - 4ac))/2a Then typing \quadratic<space> inserts:Then typing \quadratic<space> inserts:

Page 25: 302 sargent word2007-ssp2008

Math AlphabeticsMath Alphabetics

\scriptA, \frakturA, \doubleA, etc., are used to \scriptA, \frakturA, \doubleA, etc., are used to insert math script, Fraktur, and double-struck insert math script, Fraktur, and double-struck alphabeticsalphabetics

Italic and bold are controlled by italic & bold Italic and bold are controlled by italic & bold format tools and only apply to math alphabeticsformat tools and only apply to math alphabetics

Italic and/or bold is ignored for characters that Italic and/or bold is ignored for characters that don’t have corresponding Unicodedon’t have corresponding Unicode

Page 26: 302 sargent word2007-ssp2008

Linear format mathLinear format math

• Simple operand is a Simple operand is a spanspan of alphanumeric of alphanumeric characterscharacters

• E.g., simple numerator or denominator is E.g., simple numerator or denominator is terminated by any nonalphanumeric terminated by any nonalphanumeric charactercharacter

• abcabc//dd gives gives

• More complicated operands use parentheses More complicated operands use parentheses ( ), brackets [ ], or { } ( ), brackets [ ], or { }

• Outermost parens in fractions aren’t Outermost parens in fractions aren’t displayed in built-up formdisplayed in built-up form

abcd

Page 27: 302 sargent word2007-ssp2008

Linear format math (cont)Linear format math (cont)

E.g., plain text (a + c)E.g., plain text (a + c)//d displays asd displays as

• Easier to read than TEasier to read than TEEX’s, e.g., {X’s, e.g., {a + c\over da + c\over d} } • MathML: MathML: <mfrac><mrow><mi>a</mi><mo>+</mo> <mfrac><mrow><mi>a</mi><mo>+</mo>

<mi>c</mi></mrow><mrow><mi>d</mi> <mi>c</mi></mrow><mrow><mi>d</mi> </mrow></mfrac></mrow></mfrac>

• Neat feature: linear-format text looks like mathNeat feature: linear-format text looks like math

Page 28: 302 sargent word2007-ssp2008

Subscripts and SuperscriptsSubscripts and Superscripts

Unicode has numeric subscripts and Unicode has numeric subscripts and superscripts along with some operators superscripts along with some operators (U+2070-U+208E): convert to regular(U+2070-U+208E): convert to regular

Others need some kind of markup like Others need some kind of markup like <msup>…<msup>…</msup></msup>

Use TeX’s _ and ^ subscript/superscript ops for Use TeX’s _ and ^ subscript/superscript ops for input; they can be displayed as a subscripted input; they can be displayed as a subscripted down arrow and superscripted up arrowdown arrow and superscripted up arrow

Use parentheses as for fractions to overrule Use parentheses as for fractions to overrule built-in precedence orderbuilt-in precedence order

Page 29: 302 sargent word2007-ssp2008

Formula AutobuildupFormula Autobuildup

Enter formulas in linear format in a math zoneEnter formulas in linear format in a math zone When a character is typed that renders an When a character is typed that renders an

expression syntactically unambiguous, the expression syntactically unambiguous, the expression is built upexpression is built up

Edit expressions in built-up form or in linear formEdit expressions in built-up form or in linear form For integrals, type \int (which autocorrects to ∫ ) For integrals, type \int (which autocorrects to ∫ )

optionally followed by subscript and superscript optionally followed by subscript and superscript for limits, which auto build upfor limits, which auto build up

Can autocorrect \<letters> to built-up characters Can autocorrect \<letters> to built-up characters or expressionsor expressions

Page 30: 302 sargent word2007-ssp2008

Roles of Space (U+0020)Roles of Space (U+0020)

The ASCII space is rarely needed inside math The ASCII space is rarely needed inside math expressions, since math spacing is automaticexpressions, since math spacing is automatic

Use to terminate autocorrect entries and to Use to terminate autocorrect entries and to terminate expressions. When so used, is deletedterminate expressions. When so used, is deleted

Use as command to build up math objectsUse as command to build up math objects Use to define spacings for , . and : and to force a Use to define spacings for , . and : and to force a

unary operator to display with binary spacingunary operator to display with binary spacing A space builds up one subexpression; other A space builds up one subexpression; other

operators build up as many as they canoperators build up as many as they can

Page 31: 302 sargent word2007-ssp2008

Unicode SpacesUnicode Spaces

Space Unicode Autocorrect

0 em U+200B \zwsp

1/18 em U+200A \hairsp

3/18 em U+2009 \thinsp

4/18 em U+205F \medsp

5/18 em U+2005 \thicksp

6/18 em U+2004 \vthicksp

9/18 em U+2002 \ensp

18/18 em U+2003 \emsp

(digit width) U+2007 \numsp

(space width) U+00A0 \nbsp

Page 32: 302 sargent word2007-ssp2008

OperatorsOperators

Operator Precedence

CR 0

opOpen 1

opClose 2

opSeparator 3

concatenation 4

/ \atop 5

opNary 6_ ^ opFApply \above \below 7

□ ∛ ∜ ■ opHbracket 8opAccent 9

opUniSubSup 10

Page 33: 302 sargent word2007-ssp2008

Four Math InvisiblesFour Math Invisibles

There are four “invisible” math control codesThere are four “invisible” math control codes

Used for semantic content and usually don’t Used for semantic content and usually don’t display a glyph. May have a small width, e.g., display a glyph. May have a small width, e.g., Function Apply has \thinspFunction Apply has \thinsp

Math control code Unicode

Invisible Function Apply U+2061

Invisible Times U+2062

Invisible Comma U+2063

Invisible Plus U+2064

Page 34: 302 sargent word2007-ssp2008

Math LayoutMath Layout

Collaboration between 5 entities:Collaboration between 5 entities: Unicode rich-text text processing program Unicode rich-text text processing program

such as Word or RichEditsuch as Word or RichEdit LineServices math handler LineServices math handler Page/TableServices math handlerPage/TableServices math handler Math font, e.g., Cambria MathMath font, e.g., Cambria Math Math-font handlerMath-font handler

Page 35: 302 sargent word2007-ssp2008

Equation Breaking & NumberingEquation Breaking & Numbering

PTS math handler can break equations into PTS math handler can break equations into multiple lines automatically or by user breaksmultiple lines automatically or by user breaks

PTS can handle layout of equation numbersPTS can handle layout of equation numbers Client needs to support “math paragraph”Client needs to support “math paragraph” Two kinds of user breaks: at operator via context Two kinds of user breaks: at operator via context

menu, at line break (Shift+Enter)menu, at line break (Shift+Enter) At operator indentation: each TAB indents to At operator indentation: each TAB indents to

next binary/relational operatornext binary/relational operator Line break: align at specific operators, e.g., = Line break: align at specific operators, e.g., =

Page 36: 302 sargent word2007-ssp2008

Math Engine ObjectsMath Engine Objects

Page 37: 302 sargent word2007-ssp2008

Glyph VariantsGlyph Variants

Subscripts/superscriptsSubscripts/superscripts PrimesPrimes Dotless i, j used in bases of accent objectsDotless i, j used in bases of accent objects Flattened and wide accentsFlattened and wide accents Growable brackets, integrals, arrowsGrowable brackets, integrals, arrows Display of differentials using U+2146Display of differentials using U+2146 Mirror images for right-to-left mathMirror images for right-to-left math Variation selector U+FE00Variation selector U+FE00

Page 38: 302 sargent word2007-ssp2008

Cambria Math FontCambria Math Font

Cambria typeface designed by Jelle BosmaCambria typeface designed by Jelle Bosma Extended for math by Ross Mills and Andrei Extended for math by Ross Mills and Andrei

Burago in collaboration with the ClearType and Burago in collaboration with the ClearType and math-layout groupsmath-layout groups

Contains extensive math tables, glyph variants Contains extensive math tables, glyph variants and much of the Unicode math setand much of the Unicode math set

Is designed with ClearType and excellent screen Is designed with ClearType and excellent screen readibility in mindreadibility in mind

Enables best screen-resolution display of mathEnables best screen-resolution display of math

Page 39: 302 sargent word2007-ssp2008

New Math FontsNew Math Fonts

Cambria Math has new version with more math Cambria Math has new version with more math characters, e.g., U+2900..U+2AFFcharacters, e.g., U+2900..U+2AFF

202 math characters still needed for Unicode 5.1202 math characters still needed for Unicode 5.1 STIX Times Roman math font is in beta; doesn’t STIX Times Roman math font is in beta; doesn’t

support Word 2007 math wellsupport Word 2007 math well STIX has full math character set + someSTIX has full math character set + some STIX font is Type I, so it doesn’t work with the STIX font is Type I, so it doesn’t work with the

Office pdf writerOffice pdf writer Font demosFont demos

Page 40: 302 sargent word2007-ssp2008

Font Math TablesFont Math Tables

Specialized math tables have been created to Specialized math tables have been created to control glyph placementscontrol glyph placements

Position subscripts/superscripts horizontally Position subscripts/superscripts horizontally using cut-ins and italic correctionsusing cut-ins and italic corrections

Many math constants: axis height, fraction rule Many math constants: axis height, fraction rule thickness, etc.thickness, etc.

Compare kerning of Compare kerning of The math tables are formalized as OpenType The math tables are formalized as OpenType

tables accessible via mathfont.dlltables accessible via mathfont.dll

Page 41: 302 sargent word2007-ssp2008

Math ConstantsMath Constants

Page 42: 302 sargent word2007-ssp2008

User Spacing AdjustmentsUser Spacing Adjustments

Layout engine attempts to render with high Layout engine attempts to render with high typographic qualitytypographic quality

Users can spoil layout by inserting space where Users can spoil layout by inserting space where engine would insert it automaticallyengine would insert it automatically

Have autocorrect procedure to reduce thisHave autocorrect procedure to reduce this Users can insert Unicode spacesUsers can insert Unicode spaces Phantoms and smashesPhantoms and smashes Size and placement overridesSize and placement overrides

Page 43: 302 sargent word2007-ssp2008

Phantoms and SmashesPhantoms and Smashes

Phantoms have size but no display. Can Phantoms have size but no display. Can have both width & height, ascent only, have both width & height, ascent only, descent onlydescent only

Smashes display, but remove one or more Smashes display, but remove one or more sizes, e.g., descent, ascent, and/or widthsizes, e.g., descent, ascent, and/or width

Page 44: 302 sargent word2007-ssp2008

Word 2007 Math FacilityWord 2007 Math Facility

Elegant math entry and displayElegant math entry and display Display is competitive with TeXDisplay is competitive with TeX Automatic line breaking, special kerningAutomatic line breaking, special kerning More math semantics than TeX: greater More math semantics than TeX: greater

interoperability (Presentation MathML)interoperability (Presentation MathML) Input with math ribbon, context menusInput with math ribbon, context menus Formula autobuildup input methodFormula autobuildup input method WYSIWYG editing as well as linear formatWYSIWYG editing as well as linear format MS Math graphing calculator add-inMS Math graphing calculator add-in

Page 45: 302 sargent word2007-ssp2008

What Word 2007 doesn’t haveWhat Word 2007 doesn’t have

Built-in equation numberingBuilt-in equation numbering Math Find/ReplaceMath Find/Replace OpenType enhancements (aside from math OpenType enhancements (aside from math

table functionality)table functionality) Optimal line breakingOptimal line breaking Configurable math-zone vertical spacingConfigurable math-zone vertical spacing [La]TeX import/export[La]TeX import/export Document wide MathML support (only MathML Document wide MathML support (only MathML

for a single math zone)for a single math zone)

Page 46: 302 sargent word2007-ssp2008

ConclusionsConclusions Eight infrastructures allow us to do math display and Eight infrastructures allow us to do math display and

editing better than ever beforeediting better than ever before High quality math handler and font enable typography High quality math handler and font enable typography

competitive with or better than TeXcompetitive with or better than TeX Best screen-resolution display of mathematicsBest screen-resolution display of mathematics Streamlined input methods such as Formula AutobuildupStreamlined input methods such as Formula Autobuildup Incorporated into Word 2007, Word down-level Incorporated into Word 2007, Word down-level

converter, Microsoft Math calculatorconverter, Microsoft Math calculator Cambria Math font: state-of-art math fontCambria Math font: state-of-art math font