Upload
guestfbf1e1
View
393
Download
0
Tags:
Embed Size (px)
Citation preview
Derek Sturdy
Tikit Granite & Comfrey
Non-legal content integration – issues, methods, benefits
Our founders
Sir William Granite1738 - 1813
Rev. Dr. Nicholas Comfrey1742 - 1818
Tikit Granite & Comfrey
Their first employee
Tikit Granite & Comfrey
Miss Emma Hardfarthing, c. 1801
KM in perspective
matterdocuments
know-how
external resourcesinternal
(including primary law, government,online commentary, non-legal content,
"trade" sites, CDs, etc)
marketing,project
documents
Tikit Granite & Comfrey
Outline
• Who needs to link to non-legal content?
• Linking via taxonomies– implications for internal taxonomies
– taxonomy to taxonomy
– taxonomy to full text
• Linking by straight search
Tikit Granite & Comfrey
Who are the users?
• Not primarily researchers because they know how to set about it anyway
• Non-legal content linking is mainly for– lawyers at their desks– marketing people– services staff eg IT, secretaries
Tikit Granite & Comfrey
What do our users have in common?
• They want a complex issue made simple, which is impossible– all that silly stuff about "integration"
and "just give me a simple box" which results in 75,000 hits or nothing
• They will gratefully accept handsomely presented guidance
Tikit Granite & Comfrey
What's wrong with Google?
• Nothing at all, except that– your users do not know what is verified
and what is rubbish
– even the "advanced" search is just one of those oh-so-nineties field things
– 50 pages * 20 hits at legal costs = ruin
– basically, far too much information because of all the junk on the web
Tikit Granite & Comfrey
What does this actually mean?
• That all successful attempts to integrate valuable content are trying their own methods of getting round the structured – unstructured issue
• Is there a one size fits all answer? No, there isn't. Let's look at that ...
Tikit Granite & Comfrey
The internal only answer
• Relational databases (ie organised metadata)– handle precision recall – handle the updating issues– handle lateral linking– but sadly ....
• Outside your control is all the other external stuff which is still unstructured "content" – ie straight text – low value, but lots of it!
• This is a temporary phase, but it will see most of us out .....
Tikit Granite & Comfrey
Ways to approach this
• Autonomy – designed for science, brilliant at science, rubbish for law
• Metadata – which means the taxonomy stuff in terms of added value – designed for soft subjects like law and social science
• Hybrid systems – like xrefer – which use ingenious software to try and cut down the costs of the metadata approach
Tikit Granite & Comfrey
Why not purely automatic software?
• Because of the tiny legal vocabulary – 5000 terms, instead of 250,000 – with meaning dependent on context
• Because of the citation problem – not to be discussed in detail today
• In essence: automatic software needs one word to have one meaning, which is true in science (normally) and often not true of law (except at the highest level)
Tikit Granite & Comfrey
What must a taxonomy deliver?
• Real help in finding things• Therefore - no ambiguity!• Comfort for users of collections
– have I got everything relevant? - comprehensiveness– have I avoided irrelevance? - accuracy– can I easily find similar stuff? – lateral linking
• Is it still true?– if the firm knows anything about anything on which
practice is based, do I know it too?
Tikit Granite & Comfrey
Components: taxonomies
• Thesauri– legal subject, legal work type– geog./jurisdiction, industry/sector, assets
• Authority files built up for – cases– legislation– own know-how documents– grey paper
Tikit Granite & Comfrey
The Three C’s
• Classification – subject matter
• Categorisation – types of work
• Citation – reference to other documents, but especially to legal authorities
• Cases• Legislation
Tikit Granite & Comfrey
Where might these be applied?
general www resources
paid-for online resources
primary law resources
document management
practice management
know-how management
The Firm External Resources
Tikit Granite & Comfrey
example: Search Engine Applications
general www resources
paid-for online resources
primary law resources
document management
practice management
know-how management
Tikit Granite & Comfrey
Classification - subject thesauri
general www resources
primary law resources
document management
practice management
know-how management
paid-for online resources
Tikit Granite & Comfrey
Categorisation – type of work
general www resources
primary law resources
paid-for online resourcespractice management
document management
know-how management
Tikit Granite & Comfrey
Authority Files – exact Citations
general www resources
paid-for online resources
primary law resources
document management
practice management
know-how management
Tikit Granite & Comfrey
Matter documents
Know How
Matter metadata
KH m’data
Metadata density and classification workloads
Relatively simple, high volume
Millions of documents
Specialist, complex,
low volume
External Sources
Specialist know-how
Tikit Granite & Comfrey
Conclusions so far
• Only certain materials within the firm – know-how - will have detailed classification, categorisation and citation work done on them
• Most other materials in the firm will be classified at a high level only, or be classified by inheritance (eg documents within a matter file)
Tikit Granite & Comfrey
Direct taxonomy-taxonomy linking
• For the users - seriously cool and dead easy
• For IS staff - match terms not by their letters and spaces but by a one-off human reconciliation of meanings and context – ie some work
• Illustration: PLC
Tikit Granite & Comfrey
Hybrid methodologies
• Use the taxonomy to guide the inexperienced user's thoughts to the topic concerned
• Drill-down and drill-up techniques are both useful– drill down: start with the general, go to the
particular– drill up: choose a particular term, see if it
exists, see the context and alternative terms
Tikit Granite & Comfrey
Transfer the idea
• You then use this approach, developed for your own internal resources – intranets, knowledge systems, DMS – to link out to external resources
• Illustration using xrefer here ....
Tikit Granite & Comfrey
Implications for your taxonomies
• Ambiguity remains the big enemy!• Other enemies:
– "gosh aren't I clever" terms– jobs for the boys/girls – which usually result
in loss of jobs for the ......
• Pointless complexity is the source of most ambiguity – simplify!– segmented taxonomies are the neat way to
simplify
Tikit Granite & Comfrey
Ambiguity - continued
• If your taxonomies are not simple enough to avoid ambiguity, you should not be meddling with the idea at all
• Complex taxonomies are for academics with time and a clear need for lateral thinking to the n'th degree
• In legal and governmental practice, your users (as defined) may have the brain, but not the time, or may not have the brain
Tikit Granite & Comfrey
Ambiguity - continued
• Most ambiguity comes from a failure to grasp the point behind the metadata approach, which is "make it easy to find"
• Classification and categorisation are simply tools, not ends in themselves
• Ambiguity is what search engines do!
Tikit Granite & Comfrey
Direct search
• The user sees a word or phrase ...– and does not understand it– and wants to know more about it
• In the ideal world, she highlights it, clicks it, and gets seven, organised results
• In the real world, this does not happen ... but
Tikit Granite & Comfrey
Reference linking, concept blow-ups
• The jury is out on this at present
• If you do not know your topic, then you can be misled very easily
• If you have a smattering of knowledge, you can probably navigate successfully
• A little knowledge is much less dangerous than none, despite the proverb to the opposite effect!
Tikit Granite & Comfrey
Where does this leave us?
• Correct choice of provider• This remains the only way at present to
handle the "integration" and "unstructured data" problem
• pure software doesn't do it – for us• "integrators" are fine for bulletins, but
often useless for research and briefings• therefore the human element has to be
introduced at some point or another
Tikit Granite & Comfrey
Where is the point of human input?
• the metadata approach: after content has been published, the key content is indexed and abstracted
• the source selection approach: from certain sources, the content is of sufficient quality that it does not need weeding – it's already abstracted, in other words
• two sides of the same coin?
Tikit Granite & Comfrey
Conclusions
• If you develop a single, large, unsegmented taxonomy, you will be stuck with search-engine approaches to external non-legal content
• If you think beyond legal, to office (ie admin), industries / sectors (ie marketing), and so on, you can develop hybrid approaches
• These will be more powerful than just search engines, though you need those too for esoterica
• The key to this remains: choose your external content providers – give them the problem!
Tikit Granite & Comfrey
Tikit Granite & Comfrey