View
638
Download
0
Category
Tags:
Preview:
DESCRIPTION
Xi Van Fleet of the American Society of Civil Engineers (ASCE) shares her experience on rule-building, utilizing Access Innovations' Data Harmony machine-aided indexing software, as well as free online resources.
Citation preview
LESSONS LEARNED FROM PREPARING ASCE TAXONOMY FOR MACHINE AIDED INDEXING (MAI)
Xi Van Fleet
Senior Manager of Information Services
Publishing Technology Department
Publication Division
American Society of Civil Engineers
Publications of American Society of Civil Engineering
A Brief History
American Society of Civil Engineers (ASCE) was founded in 1852. We are the oldest engineering society in the Untied States.
Our first publication, Transactions of American Society of Civil Engineers, was published in 1872. It is the predecessor of our journals.
The first monograph was published in 1892.
Publications of American Society of Civil Engineering
Today
Leading publisher in civil engineering
34 Peer-reviewed journals
Books and standards
Conference proceedings
Magazines
Online Civil Engineering Knowledge Environment
250+ ASCE e-book titles 65 ASCE StandardsProceeding volumes with 42,000 papers from 2000 to presentPeers-reviewed journals with 60,000 papers from 1983 to present
More than 220,000 records with complete coverage of ASCE publications
Full-text database
Bibliographic database
Content driven
Overlapping with other engineering disciplinese.g. chemical engineering, mechanical engineering; material engineering
Strong on core disciplines: e.g. structural engineering, geotechnical engineering
Weaker on peripheral disciplines: Aerospace engineering, energy engineering
ASCE Taxonomy
The taxonomy project started in 2009
Access Innovations created the first version based on the existing CEDB subject headings and data mined from the content
The draft contained over 30,000 terms. We divided it into three individual taxonomies:
Technical topics
Geographic terms
ASCE corporate
In-house subject experts of different disciplines were invited to validate the technical topics.
Project History
“Final” Version of Taxonomy of Technical Topics
Preferred terms: 2440Equivalent terms: 3167Top terms: 22Terms with "Related Terms": 488Terms withg "Non-Preferred Terms": 1320
Prepare ASCE Taxonomy for Machine Aided Index (MAI)
• Taxonomy enrichment
• Rule building
Taxonomy Enrichment
Add Equivalent /Non-preferred Terms
• Alternative spellingAnalysis – Analyses; Modeling vs. modelling
• Irregular word formsCurricula vs. Curriculums
• Synonyms Flood – inundationHealth care facilities – Hospitals, Nursing homes…
• AcronymsAutomated people movers – APM
• Term variation• Bedforms, Bed-forms, Bed forms
Rule Building
Rules teach MAIStro to think like humans by providing it with context, logic, and instructions.
Simple rules Simple conditional rules
Complex conditional rules
Resources Used
Some Synonyms are obvious and easy.
e.g. Preferred term: Driver behavior
Equivalent/Non-Preferred Terms
How to find synonyms
How to find synonyms
Some synonyms are “hidden”, e.g. Agricultural wastes
Equivalent/Non-Preferred Terms
Preferred term: Public health and safety
How to find synonyms
How to find synonyms
Equivalent/Non-Preferred Terms
How to find synonyms
Equivalent/Non-Preferred Terms
How to find synonyms
Preferred term: Public health and safety:
Note: in our content “health” can also be used for a structure, a river, or environment.
Equivalent/Non-Preferred Terms
Preferred term: Intelligent transportation systems
How to find synonyms
Equivalent/Non-Preferred Terms
Preferred term: High-rise buildings
e.g. Spring Temple Buddha
Tokyo Spring Tree
Preferred term: Developing countriesI
ASCE taxonomy term: Civil engineering landmarksASCE Civil engineering landmarks Award list
How to find synonyms
Equivalent/Non-Preferred Terms
Irregular words
Preferred term: LaborNon-preferred term: labour
Preferred term: Structural behaviorNon-prefrerred term: Structural behaviour
Preferred term: Multi-story buildings
Non-preferred term: Multi-storey buildings
Preferred term: Fiber reinforced polymer
Non-preferred term: Fibre reinforced polymer
Equivalent/Non-Preferred Terms
Think about variation
Terms made of phrase with variations
Preferred term: Lightweight concreteNon-Preferred terms: Light-weight concrete, Light weight concrete
Preferred term: Design/Bid/BuildNon-Preferred terms: Design-bid-build, Design bid build, D/B/B/, DBB. D-B-B
Equivalent/Non-Preferred Terms
Think about variation
Equivalent/Non-Preferred Terms
Terms with prefix
Bio+Preferred termsBiobinders; Biofuels; Biocement; Biokinetics; Biofilters;
Biofouling; Biogrouting; Bioleaching…
Post + Preferred termsPostearthquakes; Postcombustion; Postcracking
Other prefix: Pre, Micro, Macro, Super. Multi, Non, Off...
Think about variation
Acronyms
Preferred term: Magnetic levitation trains Non-preferred term: Maglev
Preferred term: Automated people moversNon-preferred term: APM
Preferred term: Air traffic controlAcronym: ATC
ATC=apparent tardiness cost; applied technology council … Need disambiguation
Preferred term: Intelligent transportation systemsAcronym: ITS
Be careful with acronyms
Equivalent/Non-Preferred Terms
Create Rulebase
MAIStro automatically creates text-to-match (TTM) rule for every term, both preferred and non-preferred
TTM works for many terms:Flash floods – Flash floodsContinuing education – Continuing education Ridership – RidershipHydraulic engineering – Hydraulic engineering
Text that matches
Create Rulebase
Noun vs. verb vs. adjective vs. adverbPreferred term: Corrosion
CorrosiveCorrosivenessCorrosivity CorrodingCorrodedCorrodibleCorrodibility…
Simple ruleCorros* USE CorrosionCorrod* USE Corrosion
Text that doesn't quite match (variations)
Create Rulebase
Preferred term: Lateral loads Variations: Lateral loading; Laterally loaded…
Need simple conditional rule:load*IF (WITH "lateral*")
Lateral loadsENDIF
Text that doesn't quite match (variations)
Create Rulebase
Variations of “Span bridges”
Bridge*IF (NEAR "span" OR NEAR "short-span" OR NEAR "long-span" OR NEAR "single-span" OR NEAR "multi-span" OR NEAR "multiple-span" OR NEAR "four-span" OR NEAR "three-span" OR NEAR “one-span” OR NEAR “continuous-span" OR NEAR "simple-span" OR NEAR "large-span")
USE Span bridgesENDIF
Text that doesn't quite match (variations)
Create Rulebase
Find hyhpenated terms in our content
Preferred term: Structural analysis
Analy*IF (WITH "structur*" OR WITH "load" OR WITH "loads")
IF (NEAR "arch*" OR WITH "column*" OR NEAR "bar" OR NEAR "bars" OR NEAR "bar's" OR NEAR "beam" OR NEAR "beams" OR NEAR "strut" OR NEAR "struts" OR NEAR "compression member*" OR NEAR "tie" OR NEAR "ties" OR NEAR "tie rod" OR NEAR "tie-rod" OR NEAR "tie rods" OR NEAR "tie-rods" OR NEAR "eyebar*" OR NEAR "guy-wire*" OR NEAR "guy wire*" OR NEAR "suspension cable*" OR NEAR "wire rope*" OR NEAR "angle section*" OR NEAR "connect*" OR NEAR "coupl*" OR NEAR "diaphragm*" OR NEAR "flange*" OR NEAR "frame*" OR NEAR "bent" OR NEAR "bents" OR NEAR "girder*" OR NEAR "hollow section*" OR NEAR "hollow structural section*" OR NEAR "joint*" OR NEAR "joist*" OR NEAR "membrane*" OR NEAR "panel" OR NEAR "plate" OR NEAR "slab*" OR NEAR "stud" OR NEAR "studs" OR NEAR "tendon*" OR NEAR "tensile member*" OR NEAR "truss*" OR NEAR "tube*" OR NEAR "wall*" OR NEAR "gable*" OR NEAR "wall section*" OR MENTIONS "structural failure*" OR MENTIONS "building failure*")USE Structural analysisENDIF
Create Rulebase Text that doesn’t quite match (whole vs parts)
Bridge the gap
Raising the bar
Foundationa solid foundation, a firm foundation, research
foundation…
Toll: Toll Brothers, human toll, take a toll…
Using NULL rules
right match that is wrong
Create Rulebase - To Disambiguate
Create Rulebase
Phases that contain more than one term
Text: Continuous Multispan Concrete Girder Highway Bridges
Preferred terms:Continuous bridgesSpan bridgesConcrete bridgesGirder bridgesHighway bridges
Create Rulebase - To Disambiguate
Preferred term: Wells (noun vs adverb)
Well*
IF (WITH "hydraul*" OR WITH "Hydro*" OR WITH "Aquifer*" OR WITH "Multiaquifer*" OR WITH "discharg*" OR WITH "pump*" OR WITH "stilling" OR WITH "flow*" OR WITH "water*" OR WITH "groundwater" OR WITH "Recirculation" OR WITH "Artesian")USE Wells
Foundation*
IF (NOT (NEAR "success*" OR NEAR "research" OR NEAR "national science" OR NEAR "grant*" OR NEAR "president*" OR NEAR "ASCE foundation*" OR AROUND "engineering foundation" OR NEAR "economic" OR NEAR "prize*" OR NEAR "award*" OR NEAR "education*" OR NEAR "campaign*" OR AROUND "reason foundation" OR AROUND "national science foundation" OR AROUND "nsf" OR NEAR "job*" OR NEAR "partner*" OR NEAR "organization*" OR NEAR "scholar*"))
IF (WITH "bridge*" OR AROUND "bridge foundation*")USE Bridge foundations
ENDIFIF (WITH "dam" OR WITH "dams" OR AROUND "dam foundation*")
USE Dam foundationsENDIFIF (NEAR "deep" OR AROUND "deep foundation*")
USE Deep foundations…
Create Rulebase - To Disambiguate
If a term is impossible to write a rule, it may not a good term.
BubblesWater bubbles, air bubbles, gas bubbles, financial bubbles…
fluid dynamics, waste treatment, material science, soil mechanics…
Clue: if you have trouble place a term in the taxonomy, you are likely to have trouble creating rules for it.
Disambiguation
Create Rulebase
Test*Test, tests, testing, testings, testify, testimony, testosterone
Wave*Waves, wavelength, wave length, wavelet, wavefront, waverider, waveguide…
Truncate text with care
Preferred Term: Workplace discrimination
Discriminat*IF (WITH "age" or WITH "minority" or WITH "racial" or WITH "race" or WITH "disabilit*" or WITH "senior" or WITH "older" or WITH "old" or WITH "women" or WITH "woman" or WITH "diversity" or WITH "dispute" or WITH "equal*" or WITH "female" or WITH "male" or WITH "workplace" or WITH "African*“ or WITH “Hispanic”)USE Workplace discriminationENDIF
Text that hardly matches (need specifics)
Create Rulebase
Taxonomy Enrichment and Rule Building is a Process.
Another opportunity to fine tune the taxonomyDiffus*IF (MENTIONS "transport" OR MENTIONS "concentration" OR MENTIONS "gradient" OR MENTIONS "advetive" OR MENTIONS "equilibr*" OR MENTIONS "voc" OR MENTIONS "vocs"OR MENTIONS "volatile organic compound*" OR MENTIONS "water*" OR MENTIONS "moisture" OR MENTIONS "wave*" OR MENTIONS "flow" OR MENTIONS "chemical*" OR MENTIONS "molecul*" OR MENTIONS "soil*" OR MENTIONS "waste*" OR MENTIONS "filter*" OR MENTIONS "runoff" OR MENTIONS "run-off" OR MENTIONS "jet" OR MENTIONS "turbulen*" OR MENTIONS "gas" OR MENTIONS "emission*" OR MENTIONS "emit*" OR MENTIONS "air" OR MENTIONS "oxygen" OR MENTIONS "thermal" OR MENTIONS "solute*" OR MENTIONS "chloride*" OR MENTIONS "contamin*" OR MENTIONS "pollut*" OR MENTIONS "organic" OR MENTIONS "compound*" OR MENTIONS "nitri*" OR MENTIONS "ion" OR MENTIONS "ions" OR MENTIONS "dye" OR MENTIONS "dyes" OR MENTIONS "fluid*" OR MENTIONS "channel*" OR MENTIONS "river*" OR MENTIONS "stream*" OR MENTIONS "tidal" OR MENTIONS "hydro*" OR MENTIONS "hydrau*" OR MENTIONS "lake*" OR MENTIONS "bay" OR MENTIONS "bays" OR MENTIONS "ocean*" OR MENTIONS "coast*" OR MENTIONS "sediment*" OR MENTIONS "sea" OR MENTIONS "seas" OR MENTIONS "catchment*" OR MENTIONS "reservoir*" OR MENTIONS "estuar*" OR MENTIONS "sewage*" OR MENTIONS "flood*" OR MENTIONS "porous medi*" OR MENTIONS "concrete*" OR MENTIONS "bentonite" OR MENTIONS "cement*" OR MENTIONS "clay*" OR MENTIONS "advection*" OR MENTIONS "convection*" OR MENTIONS "eddy" OR MENTIONS "eddies" OR MENTIONS "flux")
IF (AROUND "voc" OR AROUND "vocs" OR AROUND "volatile organic compound*" OR AROUND "chemical*" OR AROUND "molecul*" OR AROUND "chlorid*" OR AROUND "nitri*" OR AROUND "ion" OR AROUND "ions" OR AROUND "polymer*" OR AROUND "species" OR AROUND "polyaromatic*" OR AROUND "hydrocarbon*" OR AROUND "aromatic*" OR AROUND "pah" OR AROUND "pahs" OR AROUND "dichloromethane*" OR AROUND "chloromethane*" OR AROUND "chemox")
USE Diffusion (chemical)ENDIFIF (AROUND "thermo*" OR AROUND "thermal" OR AROUND "thermodiffusion")
USE Diffusion (thermal)ENDIFIF (AROUND "porous" OR AROUND "porosity" OR AROUND "soil*" OR AROUND "clay*" OR AROUND "pore" OR AROUND "pores" OR AROUND
"cement*" OR AROUND "concrete*" OR AROUND "bentonite")
USE Diffusion (porous media)ENDIFIF (AROUND "fluid*")
IF (WITH "turbulen*" OR WITH "eddy" OR WITH "eddies")
USE Turbulent diffusionELSE
ENDIFIF (NOT (AROUND "voc" OR AROUND "vocs" OR AROUND "volatile organic compound*" OR AROUND "chemical*" OR AROUND "molecul*" OR
AROUND "chlorid*" OR AROUND "nitri*" OR AROUND "ion" OR AROUND "ions" OR AROUND "polymer*" OR AROUND "species" OR AROUND "polyaromatic*" OR AROUND "hydrocarbon*" OR AROUND "aromatic*" OR AROUND "pah" OR AROUND "pahs" OR AROUND "dichloromethane*" OR AROUND "chloromethane*" OR AROUND "chemox" OR AROUND "thermo*" OR AROUND "thermal" OR AROUND "thermodiffusion" OR AROUND "porous" OR AROUND "porosity" OR AROUND "soil*" OR AROUND "clay*" OR AROUND "pore" OR AROUND "pores" OR AROUND "cement*" OR AROUND "concrete*" OR AROUND "bentonite" OR AROUND "fluid*"OR WITH "wave" OR WITH "waves"))
USE DiffusionENDIF
ENDIF
• It is impossible to build perfect rules.
• Noise (rules too general) or misses (rules too granular). Try to strike a balance.
• Be ready for the unexpected. Keep note of possible equivalent terms when you are not working on the taxonomy, e.g. “ring of fire”=Earthquakes, “la nina”, “el nino”, “polar vortex” =Climate change
Taxonomy Enrichment and Rule Building is a Process
Questions?
Recommended