Upload
caren-baker
View
216
Download
2
Embed Size (px)
Citation preview
2-5 Nov 2008 34th ILO Meeting 1
International Atomic Energy Agency
WorkshopWorkshoponon
Computer-assisted IndexingComputer-assisted Indexing
Alexander Nevyjel
34th Consultative Meeting of INIS Liaison Officers2-5 November 2008, Vienna, Austria
2-5 Nov 200834th ILO Meeting 2 International Atomic Energy Agency
AgendaAgenda
• Review CAI procedures (workflow, formats, conventions)
• Thesaurus extension: Hidden terms tables
• Problems and how to overcome
• Discussion and exchange of experiences
• Hands-on training by INIS Subject Specialists(in their offices, open end for this afternoon)Tips, tricks, recommendations
2-5 Nov 200834th ILO Meeting 3 International Atomic Energy Agency
Objectives of Computer-assisted IndexingObjectives of Computer-assisted Indexing
Maintaining database quality
Saving of subject analysis manpower
Improving indexing consistency
2-5 Nov 200834th ILO Meeting 4 International Atomic Energy Agency
CAI InteractiveTraining of CAI
Records with FullIndexing
INIS Verification andProduction System
CAI Offline/Batch
Records withCAI-suggested
Descriptors
INIS SubjectAnalysis Module
Input fromMember States
FullIndexing
Proposed Terms/No Indexing
Electronic Recordsfrom Publishers
Proposed Terms/No Indexing
CAI-Workflow
Interactive CAI ProcessingBatch Mode
Conventional Processing
2-5 Nov 200834th ILO Meeting 5 International Atomic Energy Agency
CAI Batch and Online ProcessingCAI Batch and Online Processing
• Input: MemSt-CC-yymmdd-xxxxxxxxxxx
• Output: _MemSt-CC-yymmdd-xxxxxxxxxxx
• MemSt is a standard prefix (meaning “member state”)
• CC is the country code
• yymmdd is the date when the file was generated
• xxxxxxxxxxx is any additional identification
• Examples• MemSt-AR-041203-thisismytestfile
• MemSt-FR-041212-fileidentification
2-5 Nov 200834th ILO Meeting 6 International Atomic Energy Agency
CAI Batch ProcessingCAI Batch Processing
• Output: _MemSt-CC-yymmdd-xxxxxxxxxxx
• These files will carry the CAI suggested descriptors in tag 800, preceded by the string
##CAI suggestions##;
• Example:• 800^##CAI suggestions##; DESCRIPTOR1; DESCRIPTOR2;
DESCRIPTOR3; …….
• sent back to the member state for reviewing
2-5 Nov 200834th ILO Meeting 7 International Atomic Energy Agency
CAI OnlineCAI Online
• File loaded to CAI online
• All files of a Member State appear on the queue page as batch
MemSt-XX
• Please open only your own batch, do not touch other queues
• Files in a queue will be opened one after the other, in the sequence as they have been loaded
2-5 Nov 200834th ILO Meeting 8 International Atomic Energy Agency
CAI Batch ProcessingCAI Batch ProcessingReviewing ProcessReviewing Process
• Delete all suggested descriptors which are too general
• Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges, etc
• nuclear reactions
• chemical compounds, alloys, etc.
• CAI is cleaning up BT/NTs clean up BT/NTs from manual additions
• Clean up suggestions from homographic terms
• Delete “##CAI suggestions## “
• Submit file to “INIS Input Box”
2-5 Nov 200834th ILO Meeting 9 International Atomic Energy Agency
CAI OnlineCAI OnlineReviewing ProcessReviewing Process
• Delete all suggested descriptors which are too general
• Add relevant descriptors which were not found • numerical values, e.g. pressure ranges, temperature ranges, etc
• nuclear reactions
• chemical compounds, alloys, etc.
• CAI is cleaning up BT/NTs will give warnings for BT/NTs from manual additions
• Clean up suggestions from homographic terms
• Export file when finished
• File will be exported to INIS Production System (or send back to MS for reviewing if requested)
2-5 Nov 200834th ILO Meeting 10 International Atomic Energy Agency
CAI Thesaurus extensionCAI Thesaurus extension
“Hidden terms” are character patterns representing the different appearances of a concept in the free text, which is indexed by one or more descriptors.
• handled similar to “forbidden terms” with one or more USE relations
• CAI internal only
• not exported to INIS production system
• not exported to FIBRE
• not printed in any appearance of the thesaurus
• support identification of descriptors in the free text
2-5 Nov 200834th ILO Meeting 11 International Atomic Energy Agency
Hidden Terms: CompoundsHidden Terms: Compounds
Descriptor hidden term free text
MAGNESIUM BORIDES MgB_2 MgB2
MAGNESIUM CARBONATES MgCO_3 MgCO3
MAGNESIUM HYDRIDES MgH_2 MgH2
MAGNESIUM HYDROXIDES Mg(OH)_2 Mg(OH)2
IRON BROMIDES iron dibromide
IRON BROMIDES iron tribromide
ARSENIC IONS As"3"- As3-
ACETYLENE C_2H_2 C2H2
ACETALDEHYDE C_2H_4O C2H4O
ACETIC ACID C_2H_4O_2 C2H4O2
approx. 2000 hidden terms (expected 3000)
2-5 Nov 200834th ILO Meeting 12 International Atomic Energy Agency
Hidden Terms: IsotopesHidden Terms: Isotopes
Descriptor hidden term free text
CESIUM 137 Cesium 137, Cesium-137"1"3"7cs 137Cs137 caesium 137 Caesium, 137-Caesiumcaesium 137 Caesium 137, Caesium-137137 cesium 137 Cesium, 137-Cesium137 cs 137 Cs, 137-Cs137cs 137Cscs 137 Cs 137, Cs-137cs"1"3"7 Cs137
cs137 Cs137CESIUM 138 "1"3"8"mcs 138mCs
cs"1"3"8"m Cs138m
approx. 26.000 hidden terms
2-5 Nov 200834th ILO Meeting 13 International Atomic Energy Agency
Hidden Terms: Elementary ParticlesHidden Terms: Elementary Particles
Descriptor hidden term free text
B QUARKS bottom quarksT QUARKS top quarks
ELECTRON NEUTRINOS #nu#_e νe
MUON NEUTRINOS #nu#_#mu# νμ
TAU NEUTRINOS #nu#_#tau# ντ
RHO-770 MESONS #rho#(770) ρ(770)RHO-770 MESONS #rho#-770 ρ-770OMEGA-782 MESONS #omega#(782) ω(782)OMEGA-782 MESONS #omega#-782 ω-782KAONS NEUTRAL K"0 K0
KAONS NEUTRAL SHORT-LIVED K"0_S K0S
KAONS NEUTRAL LONG-LIVED K"0_L K0L
approx. 300 hidden terms
2-5 Nov 200834th ILO Meeting 14 International Atomic Energy Agency
Hidden Terms: UK/US Spellings Hidden Terms: UK/US Spellings
Descriptor hidden term
A CENTERS a centresACTIVITY METERS activity metresANALOG COMPUTERS analogue computersANALOG SYSTEMS analogue systemsANESTHESIA anaesthesiaARCHAEOLOGY archeologyAUSTRIAN ORGANIZATIONS austrian organisationsBALLISTIC MISSILE DEFENSE ballistic missile defenceBAYARD-ALPERT GAGES bayard-alpert gaugesBEAM ANALYZERS beam analysersBEHAVIOR behaviourCATALOGS catalogues
approx. 800 hidden terms
2-5 Nov 200834th ILO Meeting 15 International Atomic Energy Agency
Hidden Terms: Diacritics and Countries Hidden Terms: Diacritics and Countries
Descriptor hidden termDiacritics:
BAECKLUND TRANSFORMATION backlund transformationBRUECKNER METHOD bruckner methodBRUECKNER MODEL bruckner modelBRUNSBUETTEL REACTOR brunsbuttel reactorMOESSBAUER EFFECT mossbauer effect
Country Names:CAMBODIA kampucheaCOTE D'IVOIRE ivory coastGREECE hellasMYANMAR burmaSYRIA syrian arab republicTHAILAND siam
approx. 250 hidden terms
2-5 Nov 200834th ILO Meeting 16 International Atomic Energy Agency
Hidden Terms: Other Spellings Hidden Terms: Other Spellings
Descriptor hidden termSingular/Plural
FUNGI fungusFUNGI fungusesG MATRIX g matricesG MATRIX g matrixes
Reverse SequenceATOM-MOLECULE COLLISIONS molecule-atom collisionsATOM-MOLECULE COLLISIONS atom-molecule scatteringATOM-MOLECULE COLLISIONS molecule-atom scatteringATOM-MOLECULE COLLISIONS atom-molecule reactionsATOM-MOLECULE COLLISIONS molecule-atom reactionsATOM-MOLECULE COLLISIONS atom-molecule interactionsATOM-MOLECULE COLLISIONS molecule-atom interactions
approx. 900 hidden terms
2-5 Nov 200834th ILO Meeting 17 International Atomic Energy Agency
Hidden Terms: Other Spellings Hidden Terms: Other Spellings
Descriptor hidden termGrammatical Variations
PERIODICITY periodicPERIODICITY periodicalPERIODICITY periodically
Phrases versus compound termsRADIOWAVE RADIATION radio waveSPACE-TIME spacetimeWAVE FUNCTIONS wavefunction
TerminologyGAMMA SPECTROMETERS #gamma#ray spectrometerGAMMA SPECTROMETERS #gamma#-ray
spectrometerGAMMA SPECTROMETERS gammaray spectrometerGAMMA SPECTROMETERS gamma-ray spectrometer
2-5 Nov 200834th ILO Meeting 18 International Atomic Energy Agency
Hidden Terms: Other Spellings Hidden Terms: Other Spellings
Descriptor hidden termTerminology
SU-2 GROUPS su(2) theorySU-2 GROUPS su(2) symmetrySU-3 GROUPS su(3) theorySU-3 GROUPS su(3) symmetry
AbbreviationsCARBON DIOXIDE LASERS CO_2 laserCARBON DIOXIDE LASERS CO2 laserKOBAYASHI-MASKAWA MATRIX CKM matrixKORTEWEG-DE VRIES EQUATION kdv equation
Numerical ValuesKEV RANGE kevMEV RANGE mevGEV RANGE gev
2-5 Nov 200834th ILO Meeting 19 International Atomic Energy Agency
CAI Thesaurus ExtensionCAI Thesaurus Extension
• Thesaurus• Valid Descriptors 21.147
• Forbidden Terms 9.114
• CAI • Hidden Terms 34.105
• Total 64.366
Terminological Knowledge Base
2-5 Nov 200834th ILO Meeting 20 International Atomic Energy Agency
Terms which need special attentionTerms which need special attentionNumerical values, ranges Numerical values, ranges
• ENERGY RANGES• MEV RANGE
• MEV RANGE 01-10• MEV RANGE 10-100• MEV RANGE 100-1000
• PESSURE RANGES• Recognize pressure ranges• Translate from atm, bar, torr to Pascal
• TEMPERATURE RANGES• Recognize temperature ranges• Translate from Celsius, Fahrenheit to Kelvin• Attention: the forbidden term (since 1992)
high temperature USE TEMPERATURE RANGE 0400-1000 Kis leading often to wrong results
2-5 Nov 200834th ILO Meeting 21 International Atomic Energy Agency
Terms which need special attentionTerms which need special attentionMulti-meaning Multi-meaning
• “+” and “-“ signs • K+ KAONS PLUS, KAONS MINUS, POTASSIUM IONS
• Case sensitivity• TiN TIN (instead of TITANIUM NITRIDES)
• …this can be … CaN CALCIUM NITRIDES
• gas GALLIUM SULFIDES
• “…who is the …” WHO (World Health Organization)
• Verbs versus Nouns• “… this leads us to …” LEAD
• “… this leaves it ….” LEAVES
2-5 Nov 200834th ILO Meeting 22 International Atomic Energy Agency
Terms which need special attentionTerms which need special attentionMulti-meaning Multi-meaning
• MPA• MAXIMUM PERMISSIBLE ACTIVITY• Mega Pascal (MPa)
• GDP• GROSS DOMESTIC PRODUCT• GADOLINIUM PHOSPHIDES (GdP)
• COBRA SNAKES• COBRA REACTOR KBR-1 REACTOR
• … in isotopes….. INDIUM ISOTOPES• …at 195 deg K… ASTATINE 195
2-5 Nov 200834th ILO Meeting 23 International Atomic Energy Agency
Terms which need special attentionTerms which need special attention
• Homographic terms• Solutions SOLUTIONS or MATHEMATICAL SOLUTIONS
• Color COLOR, COLOR CENTRES, COLOR MODEL
• Flavor FLAVOR, FLAVOR MODELS
• Tunnel TUNNELS, TUNNELING, TUNNEL EFFECT
• Nuclear Reactions, e.g. 14N(γ,α)10B • Targets
• Beams
• Reactions
2-5 Nov 200834th ILO Meeting 24 International Atomic Energy Agency
Terms which need special attentionTerms which need special attentionTerms which are often wrongTerms which are often wrong
• Production• BEAM PRODUCTION • HEAT PRODUCTION • HYDROGEN PRODUCTION • ISOTOPE PRODUCTION • PARTICLE PRODUCTION • PLASMA PRODUCTION • PRODUCTION
• Transport• AIR TRANSPORT• ATOM TRANSPORT• BEAM TRANSPORT • CHARGED-PARTICLE TRANSPORT• ENVIRONMENTAL TRANSPORT• PHOTON TRANSPORT • RADIOACTIVITY TRANSPORT • TRANSPORT
• Decay• NUCLEAR DECAY
• ALPHA DECAY• BETA DECAY• …….
• PARTICLE DECAY• ELECTROMAGNETIC…• HADRONIC…• RADIATIVE…• WEAK…
2-5 Nov 200834th ILO Meeting 25 International Atomic Energy Agency
CAI Hands-on training by Subject SpecialistsCAI Hands-on training by Subject Specialists
Physics Marija
Sejmenova-Gichevska
A2477
Chemistry Christine Krieger-Levine A2478
Reactors Neviana Rashkova A2479
Live Science Bekele Negeri A2480