46
10 Things you didn’t know about KTM [email protected] Solution Enablement Specialist

10 Things You Don't Know About KTM

Embed Size (px)

DESCRIPTION

10 Things You Don't Know About KTM

Citation preview

Page 1: 10 Things You Don't Know About KTM

10 Things you didn’t know about KTM

[email protected] Solution Enablement Specialist

Page 2: 10 Things You Don't Know About KTM

What is KTM?

2

Presenter
Presentation Notes
Our KTM is not a motorbike from Austria
Page 3: 10 Things You Don't Know About KTM

What is KTM?

Kofax’ Answer to

Document Drudgery

3

Page 4: 10 Things You Don't Know About KTM

What is KTM?

Kofax’ Intelligent Document

Recognition Solution/Toolkit/Platform

4

Page 5: 10 Things You Don't Know About KTM
Page 6: 10 Things You Don't Know About KTM

The Golden Rule of KTM

6

User productivity?

Automation?

Presenter
Presentation Notes
Automation should always be subjugated to the primary needs of customers of user productivity. Help a customer turn the discussion, requirements, criteria and metrics from accuracy metrics to productivity metrics of documents/person/day or transactions/hour.
Page 7: 10 Things You Don't Know About KTM

Benefits of „User Productivity“

7

Wholesaler opens its 17th store

Pan European Wholesaler Improvement invoices/person

/day Productivity Improvement

Manual processing without Kofax 800 After 3 months of “Accuracy“ effort by PS 1200 +50%

After 2 weeks of “User Productivity” effort by PS 2500 > 3 x

Page 8: 10 Things You Don't Know About KTM

The Fallacy of OCR Accuracy

What OCR accuracy do you have?

What is the straight-through processing rate?

How much can we automate?

85% straight-through processing 23 fields → 99.29% field accuracy

6 chars/field → 99.89% character accuracy

What is the cost of the other 15%?

You will lose this deal against an OCR Provider because this deal is being fought over features and tech, and not business value

8

Page 9: 10 Things You Don't Know About KTM

Productivity vs Automation

Productivity Documents/person/day User focused Business value Optimizing core-

business processes Usability/comfort

8hrs/day Saving $€ Limit = ∞

Automation Accuracy Numbers technology focused Impossible to convert

to ROI Technology Diminishing returns Limit = 100%

9

Presenter
Presentation Notes
Use these two lists to see what topics to speak about (on the left) and what topics to avoid (on the right)
Page 10: 10 Things You Don't Know About KTM

Anyone can do KTM

Classify Separate Extract Folder

Validate Learning

10

Page 11: 10 Things You Don't Know About KTM

All you need is paper and highlighters

Classify Separate Extract Folder

Validate 11

Page 12: 10 Things You Don't Know About KTM

“Doing” KTM by hand. Paper to Excel.

12

Page 13: 10 Things You Don't Know About KTM

Classic vs Quantum

13

Newton Einstein

Schrödinger

Bohr

Presenter
Presentation Notes
Einstein was the “Last Classical Physicist” (quote from Bohr), who preferred to sit on the fence, acknowledging the new Quantum Physic, but being unwilling himself to embrace it like the next generation of Niels Bohr and Erwin Schrödinger. Bohr is considered to be one of the most important philosophical minds of the 20th century – because he was the tireless herald of Quantum Physics.
Page 14: 10 Things You Don't Know About KTM

God doesn‘t play dice. Spooky Action at a Distance

14

Presenter
Presentation Notes
Einstein didn’t like Quantum Entanglement, which he called “Spooky Action at a Distance” Spukhafte Fernwirkung. Today this is an accepted fact and the basis of existing quantum cryptography, quantum teleportation (current record - 143 km with light and 21 meters with matter, http://en.wikipedia.org/wiki/Quantum_teleportation) and the qubits of quantum computing. Einstein didn’t like the randomness of Quantum Physics, prefering the order of the Newtonian world. Einstein didn’t like the breakdown of Causality (reversibility of cause and effect) that Quantum Physics took for granted.
Page 15: 10 Things You Don't Know About KTM

Programmable vs Learning Software

Deterministic, logic, rules,

Laws, order

Probabilistic, data-driven,

Machine learning

15

Analytics

Presenter
Presentation Notes
Most software the Kofax and Partners are used to selling and developing is deterministic programmable software. Altavista and Yahoo had their day back in the 1990s with rules-based internet search. Google can along in 1998 with a data-driven approach and a simple idea (rank web pages based on references from other pages). This only required them to index the entire internet. History has shown that this was the right approach. Apple Siri is based on many thousands of hours of spoken commands, and it getting better is it hears more. Microsoft Xbox Kinect sends color-code images of the game player 60 times a second. This simple solution learns from examples of people doing actions. Most Kofax products are deterministic. It is really on KTM and VRS which are different (also OCR engines). They do not give guaranteed predicted results – all results need to be handled with probabilities.
Page 16: 10 Things You Don't Know About KTM

Transition from Determinism to Data-driven/Fuzzy/Quantum

Physics 1890 – 1920 (Classical to Quantum)

Mathematics 1931 (a system cannot demonstrate its own consistency, Kurt Gödel‘s incompleteness axiom)

Computer Science 1970 – 1990 (machine learning, neural networks, speech recognition, machine translation)

Business 2000 – 2020 (Big data, analytics, learning systems)

16

Presenter
Presentation Notes
This transition we are seeing today in IT happened 100 years ago in Physics. We are slow in catching up, but the next decade will see a massive growth in machine learning.
Page 17: 10 Things You Don't Know About KTM

EU’s Human Brain Project & USA’s BRAIN Initiative in 2013

10 billion€ from the EU over 10 years to build a human brain simulator to push forward brain research and test brain diseases.

100M$ from US government to revolutionize our understanding of the human mind and uncover new ways to treat, prevent, and cure brain disorders

“You don’t program it, it learns”

17

Presenter
Presentation Notes
Machine Learning is getting a massive funding booth from the EU, US government and Darpa in 2013. http://www.humanbrainproject.eu/�http://www.whitehouse.gov/blog/2013/04/02/brain-initiative-challenges-researchers-unlock-mysteries-human-mind http://www.darpa.mil/Our_Work/I2O/Programs/Probabilistic_Programming_for_Advanced_Machine_Learning_(PPAML).aspx
Page 18: 10 Things You Don't Know About KTM

Don’t program KTM, teach it

18

Presenter
Presentation Notes
Robots are programmed. Robots need laws, rules, logic and programs. Dogs are trained. They need to see, mimic and learn. KTM is your puppy that mimics your customer.
Page 19: 10 Things You Don't Know About KTM

Robodog will bite you

19

Presenter
Presentation Notes
This is what happens when you try to program a dog!
Page 20: 10 Things You Don't Know About KTM

“Doing” KTM by hand. Paper to Excel.

20

Presenter
Presentation Notes
Here is the food that you feed to your KTM puppy. The rows in the Excel sheet are “dog food”, where it learns to mimic the human behaviour. That is why this Excel sheet needs to be filled out by your customer’s document experts
Page 21: 10 Things You Don't Know About KTM

Field Analysis

File Class Capa Nro_DOC NOTA CPFC EMIS VENC EMIT

1.tif CapaLote 123987-2012

2.tif Duplicata 123987-2012 852147-A 60.000,00 07.248.659/0001-03 15/02/2013 15/02/2013 Y

3.tif Duplicata 123987-2012 1489/1 15.963,57 17.155.342/0003-45 01/12/2011 21/01/2012 Y

4.tif Duplicata 123987-2012 3112230U 2.195,30 86.438.280/0001-30 22/12/2011 19/01/2012 Y

5.tif Duplicata 123987-2012 4012391 81.045,00 10.932.276/0001-61 14/12/2011 23/01/2012 Y

6.tif Duplicata 123987-2012 3065357 F 1009,11 80.089.964/0001-97 27/10/2011 21/09/2012 Y

7.tif Nota Fiscal 123987-2012 65357 7.981,39 80.089.964/0001-97

8.tif Nota Fiscal 123987-2012 194.580 48.741,92 76.777.556/0001-50

9.tif Nota Fiscal 123987-2012 000.022.875 32.650,74 56.990.526/0001-10

10.tif Nota Fiscal 123987-2012 112230 2.195,30 86.438.280/0001-30

11.tif Nota Fiscal 123987-2012 194.562 7.454,92 03.364.370/0001-46

21

Presenter
Presentation Notes
Here is an example of “dog food”, “training data” which can be learned/mimicked
Page 22: 10 Things You Don't Know About KTM

Overview of fields to extract – What a customer typically gives field format number of

characters

Document type Validation with Loss Payment Rates Budget Invoice multi-

invoice

Reference number numeric 6-7

x New ones have no ref-nr!

x x

Possilby more than one per

doc

x

x Multiple

Ref-Nr per Document

CIP database

Debtor Last Name Text unlimited x x x x

only for validation, if existing

only for validation, if existing

CIP database in combination with Ref-Nr

Debtor First Name Text unlimited x x x x

only for validation, if existing

only for validation, if existing

CIP database in combination with Ref-Nr

Debtor Street Text unlimited x x x x CIP database in combination with Ref-Nr

Debtor House number numeric unlimited x x x x

CIP database in combination with Ref-Nr

Debtor Address2 Text unlimited x x x x

CIP database in combination with Ref-Nr

Debtor PostCode numeric 4 Swiss only Swiss only x Swiss only

only for validation, if existing

only for validation, if existing

CIP database in combination with Ref-Nr

Debtor City Text unlimited x x x x only for

validation, if existing

only for validation, if existing

CIP database in combination with Ref-Nr

Debtor Telephone numeric x

CIP database in combination with Ref-Nr. If there is no other number in database, then manual validation.

+42 more rows

22

Presenter
Presentation Notes
Here is a typical database column analysis. This is useful for robotic software like SQL, webservices, .Net, etc This is not really useful to learning software. It doesn’t want rules, it wants examples.
Page 23: 10 Things You Don't Know About KTM
Page 24: 10 Things You Don't Know About KTM

The most successful KTM projects focus on the user.

Make your users happy and content.

KTM is their workplace all day every day.

It is the place of encounter and collaboration between human and robot.

24

Presenter
Presentation Notes
http://terminator.wikia.com/wiki/File:Terminator.jpg http://carlibux.blogspot.co.uk/2010/12/human-vs-robot.html
Page 25: 10 Things You Don't Know About KTM

25

Human – Computer Interaction

Presenter
Presentation Notes
http://carlibux.blogspot.co.uk/2010/12/human-vs-robot.html The human always remains in control. When a customer says they want “80% automation” they are saying, “She should leave the computer, the robot should sit at the keyboard – she can come back for the remaining 20%”. This is not a good approach. A better, user-focused approach would say, “She remains at the desk all the time with her hands on the keyboard. He (KTM) is advising her and making her 5 times more productive”. Perfect collaboration
Page 26: 10 Things You Don't Know About KTM

Validation Experience

Result Type Correct Valid User Experience

True Positive Perfect! Touchless processing.

Automation.

False Negative User must press ENTER.

True Negative User Corrects/Types data.

False Positive Loss of trust. Drop of productivity.

Bad data leaves Kofax.

26

Page 27: 10 Things You Don't Know About KTM

KTM Customer Query

This [deal] was sold on the strength of KTM being able to classify and extract data from items received…. This was then used to calculate the ‘RETURN on INVESTMENT’(ROI) which enabled them to purchase the solution. The ROI was calculated with the reasonable estimate of 65% automated processing. I would expect that we should realistically see 80% to 90% automated processing of inbound items. That said, someone communicated to the client that the best they were going to see was 15% to 20% automated processing. This obviously sent the client reeling that they weren’t going to see anything close to their expected ROI and would potentially damage their business and not see the benefits from the system as expected.

Presenter
Presentation Notes
A real customer email to Kofax in 2013. The wrong thinking is encapsulated in the following sentence The ROI was calculated with the reasonable estimate of 65% automated processing. Better would be The ROI was calculated with the reasonable estimate of improving user productivity by 150%”
Page 28: 10 Things You Don't Know About KTM

So what is a reasonable

expectation of KTM?

28

Page 29: 10 Things You Don't Know About KTM

KTM should be able to significantly improve user productivity (perhaps 1.5-10x)

KTM will be able to extract perfectly information from readable and known documents.

KTM should be able to learn how to understand readable & unknown documents.

KTM’s value is in improving documents/person/day

Transactions/second (TPS) You will have access to near realtime performance graphs that

can optimize user experience and data throughput.

29

Reasonable Customer Expectations

Page 30: 10 Things You Don't Know About KTM

Benchmark Before

30

Page 31: 10 Things You Don't Know About KTM

Benchmark During

31

Page 32: 10 Things You Don't Know About KTM

Benchmark After

32

Page 33: 10 Things You Don't Know About KTM

US invoices – known vendors

33

Page 34: 10 Things You Don't Know About KTM

Goals of every KTM Project

1. Human Productivity 2. Eliminate False Positives

bad data leaving Kofax 3. Reduce False Negatives

user pressing ENTER 4. Few True Negatives

OCR Accuracy, Database problems & learning

34

Page 35: 10 Things You Don't Know About KTM

Fuzziness is your friend

35

Kofax brings messy data from the real world into the clear digital world

Page 36: 10 Things You Don't Know About KTM

Fuzziness

Fuzziness is not Random

Unpredictable

Unreliable

Complex

Fuzziness is Simple

Learning

Flexible

Tolerant

Fuzzy Software you love Google Autocomplete

Spell checkers

Grammar checkers

Spam filter

“Users who read this book…”

36

Page 37: 10 Things You Don't Know About KTM

Top Names US Census 2005

37

Top US Names 2005

1 2 3 4 5 6 7 8 9 10

Male Female Surname 1 JAMES MARY SMITH 2 JOHN PATRICIA JOHNSON 3 ROBERT LINDA WILLIAMS 4 MICHAEL BARBARA JONES 5 WILLIAM ELIZABETH BROWN 6 DAVID JENNIFER DAVIS 7 RICHARD MARIA MILLER 8 CHARLES SUSAN WILSON 9 JOSEPH MARGARET MOORE 10 THOMAS DOROTHY TAYLOR

Page 38: 10 Things You Don't Know About KTM

Vorname 1 Peter 0.80% 2 Daniel 0.80% 3 Hans 0.67% 4 Christian 0.60% 5 Thomas 0.53% 6 Walter 0.52% 7 Michel 0.49% 8 Martin 0.46% 9 René 0.45%

10 Markus 0.45% 11 Josef 0.44% 12 Patrick 0.43%

38

Swiss Forenames

13 André 0.42% 14 Bruno 0.41% 15 Philippe 0.40% 16 Maria 0.40% 17 Andreas 0.40% 18 Roland 0.39% 19 Paul 0.39% 20 Marcel 0.39% 21 Werner 0.37% 22 Antonio 0.36% 23 Pierre 0.35% 24 Urs 0.34% 25 Elisabeth 0.34%

Page 39: 10 Things You Don't Know About KTM

39

Page 40: 10 Things You Don't Know About KTM

Uses

40

37,691,912 citizens

Page 41: 10 Things You Don't Know About KTM

KTM is the heart of Kofax.

Touchless Processing

41

Page 42: 10 Things You Don't Know About KTM

KTM Search&Match

Server

Search & Match Server

42

42

SQL Database Database

Center Firewall

CSV File

Database Fuzzy Index

Page 43: 10 Things You Don't Know About KTM

PDF

PDF vs TIFF PDF/A is a Standard ISO 19005-1:2005, ISO 19005-2:2011 and future

safe

Thousands of Incompatible File Formats

Baseline TIFF Readers don‘t have to be able to read Group 4.

Any computer can read a pdf, and Chrome can „natively“.

Tiff viewers need to be installed.

PDF has layers, TIFF does not.

Searchable

PDF compresses better

TIFFs can be manipulate

PDFs have certificates, encryption, DRM, etc..

Page 44: 10 Things You Don't Know About KTM

PDF High Compression – Should be in every project

602 kb

76 kb (87%)

553 kb

Page 45: 10 Things You Don't Know About KTM

B&W PDF

117 kb

47 kb

114 kb

Page 46: 10 Things You Don't Know About KTM

Atalasoft for PDF generation