Upload
paul-cleverley
View
241
Download
0
Embed Size (px)
Citation preview
Paul H. Cleverley
Robert Gordon University, Aberdeen, Scotland, UK.
GSA Annual Conference 24th October 2017; Seattle, USA.
Applying Text and Data Mining to Geological Articles: Towards Cognitive Computing Assistants
Background - Typical usesSpatializing entities/concepts and associationse.g. ‘mentions’ of Pre-Cambrian
‘Extracting integer and float data from unstructured text e.g. ppm is an association with a chemical element
For example GeoDeepDive supported papers (Peters et al. 2015; Liu et al. 2016; Yulaeva et al. 2017) Stromatolite relationship to dolomite; link between cobalt and supercontinent assembly; extracting hydrogeological data)
Cleverley (2017) Cleverley (2017)
But what else can we do? Examples using Python...
Learning by comparison: Discriminatory Search Term Word Associations
100,000+ Society of Petroleum Engineers (SPE), American Geosciences Institute (AGI), Geological Society of London (GSL)
Primary Search Query= submarine fanComparing secondary search terms:- Miocene- Eocene
Cleverley, P.H., Burnett, S. (2014)
Stimulating Serendipity (Discriminatory Word Associations)
“….word associations highlighted new and unexpected terms... This surprising result led us to consider a new geological element which could impact our business opportunity” Geologist Oil & Gas Company 2015
n=53
To what extent do current search interfacesin your organization facilitate serendipitous discovery?
42% - To a moderate/large extent
To what extent could word co-occurrencetechniques facilitate serendipitous discovery?
75% - To a moderate/large extent
A Wilcoxon Signed Rank Test showed a statistically significant
difference (p<0.05).
CU
RR
ENT
Cleverley, P.H., Burnett, S. (2014)
100,000+ Society of Petroleum Engineers (SPE), American Geosciences Institute (AGI), Geological Society of London (GSL). Some colour coding from NASA SWEET Ontology and others.
Question: Which is the most similar formation to the Kimmeridge Formation?
Word Vectors – very simple theory
Cleverley (2016) Digital Energy
Find SimilarFind Similar
Similarity of entities
“I input the Zebbag Formation that I studied in Tunisia and it returned a lateral equivalent (in Libya) that I had not come across before.“
Geologist, Multi-National Oil and Gas Company (June 2016)
What are the analogues for xxx?
Cleverley (2016) Digital Energy
Adding more sophistication…- Curation (lemma’s, synonyms)- NLP e.g. ‘post Triassic’, ‘not porous’- Mikolov et al. (2013); Řehůřek (2014) Word2Vec: Using Neural networks to generate richer and more complex representations of meaning in text (text embedding’s).- Using Geoscience Ontologies to enrich meaning and add logic for reasoning.
More “related”to volcanics than
limestone
More “related”to limestone than
volcanics
Testing Hypotheses (word vector v word vector)
6,000+ Articles over 100 years of the Society of Economic Geologists (SEG) - (courtesy GeoScienceWorld)
Cleverley (2017(
R2=0.2576
A weak correlation. Arid environments can lead to high Ph (evaporation /
desorption) which can lead to Arsenic in Groundwater. So the more arid the environment (less rainwater), more
likely Arsenic may mobilize
Word Vector (Arsenic)
NO
AA
An
nu
al R
ain
fall
(mm
)
Testing Hypotheses (word vector v existing data)
Word Vector(US States)
6,000+ Articles over 100 years of the Society of Economic Geologists (SEG) - (courtesy GeoScienceWorld)National Oceanic and Atmospheric Administration (NOAA) Environmental Data
Cleverley (2017(
Are all the conditions likely to be in place for …?
Labelled training data + skip-grams + geoscience ‘friendly’ lexiconUsing literature too help challenge individual cognitive biases and organizational dogmaReports from United States Geological Survey (USGS) Petroleum Assessments
Cleverley (2017)
Summary – Areas for further research
• Opportunities may exist to increase the propensity of ‘general purpose’ enterprise search user interfaces to facilitate serendipity.
• Combining text analytics & machine learning to address a specific work task to provide actionable insights.
[email protected] www.paulhcleverley.com