Upload
itextpdf
View
380
Download
0
Embed Size (px)
Citation preview
PDF made easy with iText 7PDF is dead! Long live PDF!
Benoit Lagae, Developer, iText SoftwareBruno Lowagie, Chief Strategy Officer, iText Group
Is PDF dead?
PDF specifications
Everybody uses HTML
Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/
But governments love PDF
Source:http://duff-johnson.com/2014/03/10/98-percent-of-dot-com-is-html-but-38-percent-of-dot-gov-is-pdf/
Percentage of PDF files:.org: 15%.gov: 38%.edu: 27%
Publication versus …
• No need to be self-contained• May change over time• Not all content produced by the author
• e.g. Advertisements• Becoming more interactive
• e.g. Comments on a news article
… Document
• Self-contained• Unchanging (non-dynamic)• Able to be authenticated• Able to be secured/protected
Not counting HTML, PDF is king
Source:http://duff-johnson.com/2015/02/12/the-8-most-popular-document-formats-on-the-web-in-2015/
Publication:HTML depends on context
Document:PDF is forever
PDF/Eengineering
Since 2008
ISO 24517
PDF/VTprinting
Since 2010
ISO 16612
PDF/Xgraphic arts
Since 2001
ISO 15930
PDF/Aarchive
Since 2005
ISO 19005
PDF/UAaccessibility
Since 2012
ISO 14289
PDFPortable Document FormatFirst released by Adobe in 1993ISO Standard since 2008ISO 32000
Related: XFDF (ISO), EcmaScript (ISO), PRC (ISO), PAdES (ETSI), ZUGFeRD
An umbrella of standards:
iText 7: a PDF engine
Image exampleImage fox = new Image(ImageFactory.getImage(FOX));Image dog = new Image(ImageFactory.getImage(DOG));Paragraph p = new Paragraph("The quick brown ").add(fox) .add(" jumps over the lazy ").add(dog);document.add(p);
On the importance of making a document
accessible
Can everyone read this?
Some structure is helpful
title
list item
list item
list item
Label Content
Can everyone read this?
How do we read a spider chart?
Ris
k M
anag
emen
t
Stru
ctur
ed F
inan
ce
Mer
gers
& a
cqui
sitio
ns
Gov
erna
nce
& In
tern
al
Con
trol
Acc
ount
ing
Ope
ratio
ns
Trea
sury
ope
ratio
ns
Man
agem
ent I
nfor
mat
ion
& B
usin
ess
Dec
isio
n Su
ppor
tB
usin
ess
Plan
ning
&
Stra
tegy
Fina
nce
Con
trib
utio
n to
IT
Man
agem
ent
Com
mer
cial
Act
iviti
es
Taxa
tion
Func
tiona
l Lea
ders
hip
Resolve abbreviations
What goes into
rows / columns?Make info color
independent
Is this a better way to read data?
Adapting the‘quick brown fox’
example for PDF/UA
PDF/UA (part 1)PdfDocument pdf = new PdfDocument(new PdfWriter(dest));Document document = new Document(pdf);
//Setting some required parametersPdf.setTagged();pdf.getCatalog().setLang(new PdfString("en-US"));pdf.getCatalog().setViewerPreferences( new PdfViewerPreferences().setDisplayDocTitle(true));PdfDocumentInfo info = pdf.getDocumentInfo();info.setTitle("iText7 PDF/UA example");//Create XMP meta datapdf.createXmpMetadata();
PDF/UA (part 2)//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));//PDF/UA: Set alt textfoxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));//PDF/UA: Set alt textdogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);
document.close();
Result
On the importance of making a document
archivable
PDF/A
• ISO-19005– Long-term preservation of documents– Approved parts will never become invalid– Individual parts define new, useful features
• Obligations and restrictions– Metadata: ISO 16684, eXtensible Metadata Platform (XMP)– The document must be self-contained:
• All fonts need to be embedded• No external movie, sound or other binary files
– No JavaScript allowed– No encryption allowed
Three standards• PDF/A-1 (2005)
– based on PDF 1.4– Level B (“basic”): visual appearance– Level A (“accessible”): visual appearance + structural and semantic properties
(Tagged PDF)
• PDF/A-2 (2011)– Based on ISO-32000-1– Features introduced in PDF 1.5, 1.6, and 1.7:
• Added support for JPEG2000, Collections, object-level XMP, optional content• Improved support for transparency, comment types and annotations, digital
signatures– Level U (“unicode”): visual appearance + all text is in Unicode
• PDF/A-3 (2012)– Based on PDF/A-2 with only 1 difference: attachments do not need to be PDF/A
Adapting the‘quick brown fox’
example for PDF/A
PDF/A-1b examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1B, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);//Create XMP meta datapdf.createXmpMetadata();//Fonts need to be embeddedPdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));p.add(dogImage);document.add(p);document.close();
Resulting PDF/A-1b
PDF/A-1a examplePdfADocument pdf = new PdfADocument(new PdfWriter(dest), PdfAConformanceLevel.PDF_A_1A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf);pdf.setTagged();pdf.createXmpMetadata();PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);Paragraph p = new Paragraph();p.setFont(font);p.add(new Text("The quick brown "));Image foxImage = new Image(ImageFactory.getImage(FOX));foxImage.getAccessibilityProperties().setAlternateDescription("Fox");p.add(foxImage);p.add(" jumps over the lazy ");Image dogImage = new Image(ImageFactory.getImage(DOG));dogImage.getAccessibilityProperties().setAlternateDescription("Dog");p.add(dogImage);document.add(p);document.close();
Resulting PDF/A-1a
Real-world use:publishing a CSV file as PDF/A-3a and PDF/UA
United States database
United States examplepart 1: initializations
PdfADocument pdf = new PdfADocument( new PdfWriter(dest), PdfAConformanceLevel.PDF_A_3A, new PdfOutputIntent("Custom", "", "http://www.color.org", "sRGB IEC61966-2.1", new FileInputStream(INTENT)));Document document = new Document(pdf, PageSize.A4.rotate());//Setting some required parameterspdf.setTagged(); // PDF/UA and PDF/A Level apdf.getCatalog().setLang(new PdfString("en-US")); // PDF/UA pdf.getCatalog().setViewerPreferences( // PDF/UA new PdfViewerPreferences().setDisplayDocTitle(true)); // PDF/UA PdfDocumentInfo info = pdf.getDocumentInfo(); // PDF/UA info.setTitle("iText7 PDF/A-3 example"); // PDF/UA //Create XMP meta datapdf.createXmpMetadata(); // PDF/UA and PDF/A Level a
United States examplepart 2: add attachment
//Add attachmentPdfDictionary parameters = new PdfDictionary();parameters.put(PdfName.ModDate, new PdfDate().getPdfObject());PdfFileSpec fileSpec = PdfFileSpec.createEmbeddedFileSpec( pdf, Files.readAllBytes(Paths.get(DATA)), "united_states.csv", "united_states.csv", new PdfName("text/csv"), parameters, PdfName.Data, false);fileSpec.put(new PdfName("AFRelationship"), new PdfName("Data"));pdf.addFileAttachment("united_states.csv", fileSpec);PdfArray array = new PdfArray();array.add(fileSpec.getPdfObject().getIndirectReference());pdf.getCatalog().put(new PdfName("AF"), array);
United States examplepart 3: parse CSV file
PdfFont font = PdfFontFactory.createFont(FONT, true);PdfFont bold = PdfFontFactory.createFont(BOLD_FONT, true);// Parsing a CSV file and add data to a tableTable table = new Table(new float[]{4, 1, 3, 4, 3, 3, 3, 3, 1});table.setWidthPercent(100);BufferedReader br = new BufferedReader(new FileReader(DATA));String line = br.readLine();process(table, line, bold, true);while ((line = br.readLine()) != null) { process(table, line, font, false);}br.close();document.add(table);document.close();
United States examplepart 4: process each line
public void process(Table table, String line, PdfFont font, boolean isHeader) { StringTokenizer tokenizer = new StringTokenizer(line, ";"); while (tokenizer.hasMoreTokens()) { if (isHeader) { table.addHeaderCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } else { table.addCell( new Cell().add( new Paragraph(tokenizer.nextToken()).setFont(font))); } }}
United States example: result
United States example: result
Real-world use:ZUGFeRD,
the future of invoicing
Invoices:Need to be archived
Invoices:Need to be accessible
Invoices:Need to be machine-readable
Invoices:Need to be machine-readable
iText 7 and its value add-ons
New in iText 7:improved typographyand support for Indic
scripts
iText 5: missing links
Indic scripts:•Only unsupported major script family•Feature request #1•Huge opportunity
• limited support in most other PDF libraries
Other features:•Optional ligatures in Latin script•Vowel diacritics in Arabic
Indic scripts: problems•Lack of expertise
• Unicode encodes 49 Indic scripts• Complex scripts with unique features
• Glyph repositioning: ह + ि� = हिह• Glyph substitution: ம + ு� = மு• Half-characters: त + �� + य = त्य
•Unsolvable issues for iText 5 font engine• No dedicated Unicode points for half-characters• No font lookups past ‘\uFFFF’• Ligaturization is context-dependent (virama)
Indic scripts: solutionsWriting a new font engine
• Automatic script recognition• Based on Unicode ranges
• Flexibility = extensibility• Generic Shaper class • Separate module, only called when necessary
• Glyph replacement rules• Different per writing system• Alternate glyphs are font-dependent
Indic scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\u0938\u093E\u0939\u093F\u0924\u094D\u092F\u0915\u093E\u0930"; // saahityakaardocument.add(new Paragraph(txt).setFont(font));
String txt = "\u0B8E\u0BB4\u0BC1\u0BA4\u0BCD\u0BA4\u0BBE\u0BB3\u0BB0\u0BCD"; // eluttaalardocument.add(new Paragraph(txt).setFont(font));
Other scripts: examplesPdfFont font = PdfFontFactory.createFont(arial, PdfEncodings.IDENTITY_H, true);String txt = "\ u0627\u0644\u0643\u0627\u062A\u0628"; // al-katibudocument.add(new Paragraph(txt).setFont(font));
String txt = "writer"; GlyphLine glyphLine = font.createGlyphLine(txt);Shaper.applyLigaFeature(foglihtenNo07, glyphLine, null);canvas.showText(glyphLine)
Status of advanced typography in iText 7
•Indic scripts• We already support:
• Devanagari• Tamil
• Coming soon:• Telugu• Others: based on customer demand
•Arabic• Support for vocalized Arabic (diacritics) is in development
•Latin• Optional ligatures are fully supported