34
Derek Sturdy Tikit Granite & Comfrey Non-legal content integration – issues, methods, benefits

xreferplus-dereksturdy

Embed Size (px)

Citation preview

Page 1: xreferplus-dereksturdy

Derek Sturdy

Tikit Granite & Comfrey

Non-legal content integration – issues, methods, benefits

Page 2: xreferplus-dereksturdy

Our founders

Sir William Granite1738 - 1813

Rev. Dr. Nicholas Comfrey1742 - 1818

Tikit Granite & Comfrey

Page 3: xreferplus-dereksturdy

Their first employee

Tikit Granite & Comfrey

Miss Emma Hardfarthing, c. 1801

Page 4: xreferplus-dereksturdy

KM in perspective

matterdocuments

know-how

external resourcesinternal

(including primary law, government,online commentary, non-legal content,

"trade" sites, CDs, etc)

marketing,project

documents

Tikit Granite & Comfrey

Page 5: xreferplus-dereksturdy

Outline

• Who needs to link to non-legal content?

• Linking via taxonomies– implications for internal taxonomies

– taxonomy to taxonomy

– taxonomy to full text

• Linking by straight search

Tikit Granite & Comfrey

Page 6: xreferplus-dereksturdy

Who are the users?

• Not primarily researchers because they know how to set about it anyway

• Non-legal content linking is mainly for– lawyers at their desks– marketing people– services staff eg IT, secretaries

Tikit Granite & Comfrey

Page 7: xreferplus-dereksturdy

What do our users have in common?

• They want a complex issue made simple, which is impossible– all that silly stuff about "integration"

and "just give me a simple box" which results in 75,000 hits or nothing

• They will gratefully accept handsomely presented guidance

Tikit Granite & Comfrey

Page 8: xreferplus-dereksturdy

What's wrong with Google?

• Nothing at all, except that– your users do not know what is verified

and what is rubbish

– even the "advanced" search is just one of those oh-so-nineties field things

– 50 pages * 20 hits at legal costs = ruin

– basically, far too much information because of all the junk on the web

Tikit Granite & Comfrey

Page 9: xreferplus-dereksturdy

What does this actually mean?

• That all successful attempts to integrate valuable content are trying their own methods of getting round the structured – unstructured issue

• Is there a one size fits all answer? No, there isn't. Let's look at that ...

Tikit Granite & Comfrey

Page 10: xreferplus-dereksturdy

The internal only answer

• Relational databases (ie organised metadata)– handle precision recall – handle the updating issues– handle lateral linking– but sadly ....

• Outside your control is all the other external stuff which is still unstructured "content" – ie straight text – low value, but lots of it!

• This is a temporary phase, but it will see most of us out .....

Tikit Granite & Comfrey

Page 11: xreferplus-dereksturdy

Ways to approach this

• Autonomy – designed for science, brilliant at science, rubbish for law

• Metadata – which means the taxonomy stuff in terms of added value – designed for soft subjects like law and social science

• Hybrid systems – like xrefer – which use ingenious software to try and cut down the costs of the metadata approach

Tikit Granite & Comfrey

Page 12: xreferplus-dereksturdy

Why not purely automatic software?

• Because of the tiny legal vocabulary – 5000 terms, instead of 250,000 – with meaning dependent on context

• Because of the citation problem – not to be discussed in detail today

• In essence: automatic software needs one word to have one meaning, which is true in science (normally) and often not true of law (except at the highest level)

Tikit Granite & Comfrey

Page 13: xreferplus-dereksturdy

What must a taxonomy deliver?

• Real help in finding things• Therefore - no ambiguity!• Comfort for users of collections

– have I got everything relevant? - comprehensiveness– have I avoided irrelevance? - accuracy– can I easily find similar stuff? – lateral linking

• Is it still true?– if the firm knows anything about anything on which

practice is based, do I know it too?

Tikit Granite & Comfrey

Page 14: xreferplus-dereksturdy

Components: taxonomies

• Thesauri– legal subject, legal work type– geog./jurisdiction, industry/sector, assets

• Authority files built up for – cases– legislation– own know-how documents– grey paper

Tikit Granite & Comfrey

Page 15: xreferplus-dereksturdy

The Three C’s

• Classification – subject matter

• Categorisation – types of work

• Citation – reference to other documents, but especially to legal authorities

• Cases• Legislation

Tikit Granite & Comfrey

Page 16: xreferplus-dereksturdy

Where might these be applied?

general www resources

paid-for online resources

primary law resources

document management

practice management

know-how management

The Firm External Resources

Tikit Granite & Comfrey

Page 17: xreferplus-dereksturdy

example: Search Engine Applications

general www resources

paid-for online resources

primary law resources

document management

practice management

know-how management

Tikit Granite & Comfrey

Page 18: xreferplus-dereksturdy

Classification - subject thesauri

general www resources

primary law resources

document management

practice management

know-how management

paid-for online resources

Tikit Granite & Comfrey

Page 19: xreferplus-dereksturdy

Categorisation – type of work

general www resources

primary law resources

paid-for online resourcespractice management

document management

know-how management

Tikit Granite & Comfrey

Page 20: xreferplus-dereksturdy

Authority Files – exact Citations

general www resources

paid-for online resources

primary law resources

document management

practice management

know-how management

Tikit Granite & Comfrey

Page 21: xreferplus-dereksturdy

Matter documents

Know How

Matter metadata

KH m’data

Metadata density and classification workloads

Relatively simple, high volume

Millions of documents

Specialist, complex,

low volume

External Sources

Specialist know-how

Tikit Granite & Comfrey

Page 22: xreferplus-dereksturdy

Conclusions so far

• Only certain materials within the firm – know-how - will have detailed classification, categorisation and citation work done on them

• Most other materials in the firm will be classified at a high level only, or be classified by inheritance (eg documents within a matter file)

Tikit Granite & Comfrey

Page 23: xreferplus-dereksturdy

Direct taxonomy-taxonomy linking

• For the users - seriously cool and dead easy

• For IS staff - match terms not by their letters and spaces but by a one-off human reconciliation of meanings and context – ie some work

• Illustration: PLC

Tikit Granite & Comfrey

Page 24: xreferplus-dereksturdy

Hybrid methodologies

• Use the taxonomy to guide the inexperienced user's thoughts to the topic concerned

• Drill-down and drill-up techniques are both useful– drill down: start with the general, go to the

particular– drill up: choose a particular term, see if it

exists, see the context and alternative terms

Tikit Granite & Comfrey

Page 25: xreferplus-dereksturdy

Transfer the idea

• You then use this approach, developed for your own internal resources – intranets, knowledge systems, DMS – to link out to external resources

• Illustration using xrefer here ....

Tikit Granite & Comfrey

Page 26: xreferplus-dereksturdy

Implications for your taxonomies

• Ambiguity remains the big enemy!• Other enemies:

– "gosh aren't I clever" terms– jobs for the boys/girls – which usually result

in loss of jobs for the ......

• Pointless complexity is the source of most ambiguity – simplify!– segmented taxonomies are the neat way to

simplify

Tikit Granite & Comfrey

Page 27: xreferplus-dereksturdy

Ambiguity - continued

• If your taxonomies are not simple enough to avoid ambiguity, you should not be meddling with the idea at all

• Complex taxonomies are for academics with time and a clear need for lateral thinking to the n'th degree

• In legal and governmental practice, your users (as defined) may have the brain, but not the time, or may not have the brain

Tikit Granite & Comfrey

Page 28: xreferplus-dereksturdy

Ambiguity - continued

• Most ambiguity comes from a failure to grasp the point behind the metadata approach, which is "make it easy to find"

• Classification and categorisation are simply tools, not ends in themselves

• Ambiguity is what search engines do!

Tikit Granite & Comfrey

Page 29: xreferplus-dereksturdy

Direct search

• The user sees a word or phrase ...– and does not understand it– and wants to know more about it

• In the ideal world, she highlights it, clicks it, and gets seven, organised results

• In the real world, this does not happen ... but

Tikit Granite & Comfrey

Page 30: xreferplus-dereksturdy

Reference linking, concept blow-ups

• The jury is out on this at present

• If you do not know your topic, then you can be misled very easily

• If you have a smattering of knowledge, you can probably navigate successfully

• A little knowledge is much less dangerous than none, despite the proverb to the opposite effect!

Tikit Granite & Comfrey

Page 31: xreferplus-dereksturdy

Where does this leave us?

• Correct choice of provider• This remains the only way at present to

handle the "integration" and "unstructured data" problem

• pure software doesn't do it – for us• "integrators" are fine for bulletins, but

often useless for research and briefings• therefore the human element has to be

introduced at some point or another

Tikit Granite & Comfrey

Page 32: xreferplus-dereksturdy

Where is the point of human input?

• the metadata approach: after content has been published, the key content is indexed and abstracted

• the source selection approach: from certain sources, the content is of sufficient quality that it does not need weeding – it's already abstracted, in other words

• two sides of the same coin?

Tikit Granite & Comfrey

Page 33: xreferplus-dereksturdy

Conclusions

• If you develop a single, large, unsegmented taxonomy, you will be stuck with search-engine approaches to external non-legal content

• If you think beyond legal, to office (ie admin), industries / sectors (ie marketing), and so on, you can develop hybrid approaches

• These will be more powerful than just search engines, though you need those too for esoterica

• The key to this remains: choose your external content providers – give them the problem!

Tikit Granite & Comfrey

Page 34: xreferplus-dereksturdy

Tikit Granite & Comfrey