View
790
Download
1
Category
Tags:
Preview:
Citation preview
© cortical.io inc. 2015
Empower your enterprise with language intelligence
free access at
api.cortical.io
contact: f.webber@cortical.io
© cortical.io inc. 2015
who we are• cortical.io inc. science startup in Vienna - Austria
• result of the CEPT project (Cortical Engine for Processing Text)
• advances in brain theory guided us to a fundamentally new approach for natural language processing
• we are investor backed in the second round
• we made semantic fingerprinting accessible, robust, scalable, intuitive and easy to use
© cortical.io inc. 2015
big (text) data
• businesses, organizations and governments are threatened by the big data explosion.
• a substantial part of this data consists of text.
• computers ‘understand’ numbers but ignore the meaning of language
© cortical.io inc. 2015
the downsides
existing semantic systems are…
…hard to build (sometimes impossible)
…inaccurate & fragile (in real-world use)
…expensive to buy (licenses & services)
…tricky to integrate (setup, tuning, training)
…laborious to run (metadata management)
…hard to maintain (dictionaries, ontologies)
© cortical.io inc. 2015
Semantic Fingerprinting
5
• semantic fingerprinting bridges the gap between natural language processing and knowledge management
• language is represented using the same data format as found in the neocortex (mammalian brain)
• the cortical.io Retina behaves like a sensorial organ for language
• meaning is embodied in thousands of self-learned semantic features
© cortical.io inc. 2015
Semantic Fingerprinting
6
organ
piano
church liver
• the cortical.io Retina converts every word into its semantic fingerprint
• the fingerprints allow direct semantic comparison of the meanings between words
• similar fingerprints have similar meanings
© cortical.io inc. 2015
Semantic Similarity
7cat dogcat+dog
home & family aspects
cat specificaspects
dog specificaspects
biologyaspects
38%
© cortical.io inc. 2015
word sense disambiguation
rock
apple
computer
sense 1
sense 2
sense …nsongwriter
vocals spector airplay album
seeds flowers
pollinators pests
insects
trees
fruit
sense 2a
vegetables berries
ingredients sugar diet
sense 2 …m
food
macintosh microsoft
linux software hardware
© cortical.io inc. 2015
Text Fingerprinting
10
• word fingerprints can be stacked together to form fingerprints of any piece of text.
• all semantic fingerprint properties remain: similar fingerprints mean similar texts.
• representation is made through more than 16K features.
aggregation + sparsification
teens like to hear music on their mobile phones
teens like to hear music on their mobile phones
© cortical.io inc. 2015
teens like playing good music with their mobile phones
you can also consume chart hits with your notebook27%
Text Similarity 1
11
© cortical.io inc. 2015
teens like playing good music with their mobile phones
the fishermen are sailing out of the harbor9%
Text Similarity 2
12
© cortical.io inc. 2015
similarity engineexample document
most similar documents
ordered along the users
information need
query document index
result set
ranking
NLP Functionality: Search
© cortical.io inc. 2015
NLP Functionality: classification
cow elephantdog spider frog
“mammal or mammals or mammalian”
most relevant matching area
Literally:
© cortical.io inc. 2015
Demos @ cortical.io
Demonstrations
© cortical.io inc. 2015
Evaluation
16
There are very few comparable algorithms: a couple of academic ones that cannot be readily used for production purposes and Google’s Word2Vec.
The MEN Test Collection: http://clic.cimec.unitn.it/~elia.bruni/MEN.html The RG-65 Test Collection: http://www.aclweb.org/aclwiki/index.php?title=RG-65_Test_Collection_(State_of_the_art) The WordSimilarity-353 Test Collection: http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/ Yu&Dredzde 2014: http://arxiv.org/pdf/1411.4166.pdf Distributed representations of words and phrases: http://papers.nips.cc/paper/5021-di
© cortical.io inc. 2015
disciplines of
language intelligence• locate documents • find web content • match people • identify products • monitor competitors • file business information • discover new knowledge • track customer satisfaction • avoid duplication of work • advertise on the Internet • mine for evidences • improve security
© cortical.io inc. 2015
business applications
“Anything that can be expressed with text can be matched: - products with LinkedIn profiles, - tweets with Facebook timelines, - job descriptions with CVs …”
© cortical.io inc. 2015
into a stream of semantic fingerprints
not matching
convert thetwitter firehose
to generate a realtime
content sub-stream
MATCHMATCHMATCH
Filter
application: streaming text filter
© cortical.io inc. 2015
resulting filter
fingerprint
creating filter fingerprintswords
text
simple words, keywords
text or text-documents of any size
profile descriptions or message postings from social media
the expression builder allows interactive design of boolean specifications like: jaguar - Porsche = tiger
the fingerprint editor allows the “drawing” of fingerprints. The meaning of the resulting fingerprint can be monitored through the context terms
© cortical.io inc. 2015
• match people by their profiles
• no keyword or field based string matching limitations
• semantic similarity measure to compare professional profiles
• different profiles for professional, leisure, interests, sports etc…
profile fingerprint
activity fingerprint
application: profile matching
© cortical.io inc. 2015
• create fingerprints from product descriptions
• find similar products by matching description fingerprints
• create customer fingerprints from purchased products
product description fingerprint
Product recommendations
similar products recommendationsmatch
application: product recommendation
© cortical.io inc. 2015
simplicity
• no prior expertise in natural language processing or linguistics are needed.
• easy and intuitive definition of semantic filters or classifiers.
• all types of text (words, sentences, paragraphs, chapters, books, etc…) are processed in the same way using fingerprints.
• easy expansion to other languages by switching to any of the available language retinas.
• zero configuration and no parameter tweaking needed
cortical.io advantages
© cortical.io inc. 2015
cortical.io advantages
efficiency
• semantic fingerprints are small 2K byte sized binary vectors.
• only binary operators are used - no floating point operations needed.
• linear scalability as the engine takes advantage of a parallel computing infrastructure (multicore, cluster, virtualization) to match any performance needed.
• high throughput as complex NLP operations are executed in a single step and are therefore much faster than with traditional statistical systems.
© cortical.io inc. 2015
quality
• higher precision on NLP operations due to the large number of semantic features used (>16K).
• automatic disambiguation of human language due to the novel approach.
• full language independence, equally high quality results in all languages due to complete avoidance of any statistical language models.
• no unintended bias as no human input is needed as gold standard.
• automatic update as new words and concepts can be added continuously.
cortical.io advantages
Recommended